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1.0  INTRODUCTION 

The  purpose  of  the  Wide-Band  Signal  Processor  (WBSP)  study 
was  to  define  an  architecture  for  a highly  reliable  spaceborne 
communications  processor.  The  processor  was  to  be  designed  to 
perform  the  demodulation,  decoding,  de-interleaving,  interleaving, 
encoding,  formatting  and  routing  tasks  anticipated  for  the  SSS 
satellites.  The  goal  was  to  define  a system  having  a long-term 
reliability  comparable  to  that  of  the  Paul  t-Tol  er.int  Spacoborne 
Computer  (FTSC)  and  having  a comparably  low  redundancy  overhead. 

This  report  summarizes  the  results  of  that  study. 

The  FTSC  itself  served  as  the  startinq  point  for  the  study 
for  two  reasons: 

•Considerable  effort  had  gone  into  the  FTSC  design  to 
make  it  truly  fault  tolerant  and,  in  particular,  to 
ensure  that  faults  could  be  isolated  and  the  faulty 
module  replaced.  The  ability  to  isolate  faults  is 
crucial  to  any  fault-tolerant  design;  it  was  clearly 
advantageous,  therefore,  to  appropriate  for  the  WBSP  all 
the  applicable  fault-tolerant  features  of  the  FTSC. 

•Roughly  twenty  new,  radiation  hardened,  CMOS/SOS  LSI 
devices  are  being  developed  for  the  FTSC.  The  WBSP 
development  cost  could  be  significantly  reduced  by 
designing  it  to  be  implemented  to  the  greatest  extent 
possible  with  these  FTSC  devices. 

The  WBSP  architecture  resulting  from  this  effort  is  described 
in  the  following  section.  It  is  designed  to  operate  as  a peripheral 
to  the  FTSC  (cf.  Figure  1-1) . 
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Three  types  of  processors  are  used  in  the  WliSP : a core; 

8-bit  microprocessor  and  two  variations.  The  core  processor  is 
described  in  Section  3 along  with  the  various  microinstruction 
algorithms  that  have  been  developed  for  it.  The  two  variations, 
the  demodulator  processor  and  the  de-interleaver  processor,  are 
discussed  in  Sections  4 and  5,  respectively.  Reliability,  power 
and  weight  estimates  for  the  WBSP  and  the  FTSC  arc  presented  in 
Section  6 along  with  an  assessment  of  the  magnitude  of  the 
processing  tasks  (navigation,  guidance,  control  and  general 
housekeeping  tasks  as  well  as  communications  tasks)  relegated  to 
the  FTSC.  Section  7 concludes  with  an  assessment  "f  t he  rpsu.lt  s 
to  date  and  recommendations  for  further  work. 
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2 . 0 WBSP  SYSTEM  CONFIGURAT ION 

One  of  the  most  important  lessons  learned  during  the  earlv 
phases  of  the  FTSC  study  concerned  the  dramatic  improvement  in 
reliability  that  could  be  achieved  by  pooling  spares.  A spare 
module  capable  of  replacing  any  one  of,  say,  five  active  modules 
in  the  event  of  a failure  improves  the  system  reliability  nearly 
as  much  as  if  each  of  the  five  modules  had  a dedicated  spare;  yet 
the  latter  configuration  obviously  recjuires  five  times  as  much 
redundant  hardware.  Two  conditions  must  be  satisfied,  however, 
in  order  to  exploit  this  potential: 

•The  switching  device  used  to  isolate  the  faulty 
module  and  to  replace  it  with  one  of  the  pooled 
spares  must  be  so  designed  that  its  unreliability 
does  not  dissipate  much  of  the  potential  of  this 
approach . 

•The  processor  must  be  partitioned  to  the  largest 
extent  possible  into  interchangeable  modules  so 
that  spares  can  in  fact  be  pooled. 

The  first  of  these  requirements  was  addressed  extensively  in 
the  FTSC  design  and  the  results  of  this  effort  were  adapted  with 
only  minor  modifications  for  the  WBSP.  The  second  reauirement 
was  relatively  easily  satisfied  in  the  WBSP  due  to  the  fact  that 
it  is  inherently  a multichannel  processor;  i.e.,  the  processor 
must  perform  similar  operations  on  a number  of  parallel  channels. 
This  fact  alone  suggests  .1  multiprocessor  confi  epu  at  i on  . As  will 
be  seen,  the  WBSP  architecture  takes  full  advantage  of  this 
inherent  parallelism. 

The  initial  candidate  WBSP  architecture  consisted  of  a 
number  of  identical  processors  and  a memory  svstem  all  attached 
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to  a single  bus  structure.  The  active  processors  were  assigned 
various  functions  (e.g.f  demodulation,  decoding)  under  FTSC 
control.  Data  were  transferred  from  one  processor  or  memory  to 
another  (e.g.,  from  a demodulator  to  the  de-interleaver  memory) 
under  the  control  of  a dedicated  bus  controller.  When  a fault 
was  detected,  the  FTSC  intervened,  isolating  the  faulty  module 
and  programming  a spare  module  to  take  over  its  function. 

The  major  disadvantage  of  the  single-bus  configuration  was 
that  the  number  of  loads  on  the  bus  became  quite  large,  causing 
excessive  bus  capacitance,  and  hence  limited  throughput,  combined 
with  high  throughput  requirements.  Several  modifications  of  the 
original  configuration  were  investigated  in  an  attempt  to  over- 
come this  problem.  The  most  promising  configuration,  and  the  one 
to  be  described  in  the  remainder  of  this  report,  involves  four 
buses,  all  controlled  by  a sinqle  (dual-redundant)  controller. 

The  four-bus  configuration  obviously  reduces  both  the  number  of 
loads  and  the  throughput  requirements  on  each  bus.  It  does 
restrict  the  extent  to  which  spares  can  be  pooled,  however,  since 
a module  can  serve  as  a spare  only  for  those  modules  interfacing 
with  the  same  bus.  This  disadvantage  is  compensated  for  bv  the 
fact  that  the  modules  associated  with  each  bus  can  be  specifically 
tailored  to  a more  restricted  set  of  processing  tasks,  thus 
increasing  their  efficiency.  The  advantages  of  this  feature  of 
the  recommended  WBSP  configuration  will  become  apparent  in  the 
followinq  discussion. 

The  selected  WBSP  configuration  is  shown  in  Fiqure  2-1 . Each 
of  the  four  buses  consists  of  an  eight-bit  data  byte,  an  eight- 
bit  address  byte,  a data  parity  bit,  an  address  parity  bit  and  a 
spare  byte  to  be  used  to  replace  any  failed  data  or  address  bvte 
or  parity  segment.  Associated  with  each  bus  are  two  triple- 
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modular  redundant  (TMR)  control  lines.  One  is  used  by  the 
controller  to  indicate  that  a valid  address  is  on  the  address  bus; 
the  second  is  used  to  distinguish  between  hard  and  soft  address 
modes  (see  below) . 

& In  normal  operation,  the  bus  controller  simply  reads  from 
Jits  own  memory  a sequence  of  addresses  for  each  of  the  four  address 
Abuses,  gates  these  addresses  onto  the  buses  and  raises  the  valid- 
address  control  signal  thereby  initiating  a new  bus  cycle..  The 
/most  significant  half  of  the  address  designates  the  address  of 
f the  module  that  is  to  transmit  onto  the  data  bus  during  the 
^ following  bus  cycle;  the  least  significant  half  designates  the 
module  that  is  to  receive  the  data  currently  being  transmitted. 

i • 

The  addresses  used  in  normal  operation  are  referred  to  as 
"soft"  addresses.  These  addresses  designate  functions  rather  than 
' modules.  That  is,  a module  responds  to  a given  soft  address  only 
if  it  has  been  previously  programmed  to  implement  the  function 
■ corresponding  to  that  soft  address.  This  initialization  is  done 
using  "hard"  addresses.  Each  module  is  permanently  assigned  a 
unique  (for  a given  bus)  hard,  or  phvsical,  address  to  which  it 
responds  only  in  the  hard  address  mode.  This  mode  is  used  only 
to  test  and  to  reconfigure  the  WBSP.  When  a fault  is  detected 
either  by  the  FTSC  or  by  the  fault  monitors  (decoders)  associated 
with  each  module  interface,  the  FTSC  is  interrupted  and,  in 
effect,  takes  over  control  of  the  WBSP.  By  transmitting  hard 
addresses  over  the  WBSP-  address  buses,  it  can  test  any  specified 
module,  deactivate  it  if  necessary,  activate  a spare  module  and 
program  it  (by  loading  its  control  memory  with  the  appropriate 
microinstructions)  to  take  over  the  function  vacated  by  the  failed 
module.  The  FTSC  then  allows  the  WBSP  to  resume  normal  operation. 
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(The  bus  structure  is  critical  to  the  fault  tolerance  of 
the  system.  Provisions  must  be  made,  for  example,  to  ensure  that 
failed  modules  can  be  deactivated,  regardless  of  the  nature  of 
the  failure;  that  once  a failed  module  is  deactivated,  it  cannot 
disable  either  of  the  buses  with  which  it  communicates,  etc.  All 
of  these  contingencies  were  successfully  addressed  in  the  FTSC 
design.  Accordingly,  except  for  the  modifications  needed  to 
accommodate  the  narrower  WBSP  buses,  the  WBSP  bus  structure  is 
identical  to  that  used  in  the  FTSC  and  is  implemented  exclusively 
with  LSI  devices  being  developed  for  the  FTSC.) 

In  normal  operation  (cf , Figure  2-1) , the  bus  controller 
routes  information  from  the  analog-to-digital  converters  to  the 
demodulator  processors;  from  there  the  demodulated  (soft-decision) 
chip  data  are  routed  to  the  appropriate  de-interleaver  processor 
and  then  to  one  of  the  decoder  processors.  The  decoded  data  is 
then  stored  into  the  FTSC's  main  memory  where  each  frame  is 
tested  for  integrity  (by  checking  the  message  "tails")  and 
duplicates  of  messages  are  eliminated.  If  a tail  check  indicates 
a malfunctioning  WBSP  module,  the  FTSC  initiates  the  appropriate 
fault  diagnostic  routine  and,  if  necessary,  reconfigures  the  WBSP. 
Otherwise,  the  processed  message  is  sent  back  to  the  WBSP  to  one 
of  its  de-interleaver  processors  where  it  is  encoded  and  inter- 
leaved. From  there,  the  bus  controller  routes  it  to  one  of  the 
decoder  processors  where,  if  appropriate,  it  is  modified  using  a 
Queen's  code  and  cover  sequence.  It  is  then  sent  to  the  appropriate 
modulator  for  down-link  or  cross-link  transmission. 

As  indicated  in  Figure  2-1,  the  A/D  converters  and  the 
modulators  are  configured  in  dual-redundant  pairs,  with  each  pair 
dedicated  to  a particular  r.f.  link.  This  configuration  precludes 
the  pooling  of  spares  and  hence  results  in  a less  efficient 
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utilization  of  redundancy  than  would  otherwise  be  possible. 

Pooling  spare  A/D  converters  or  modulators  would  require  analog 
multiplexers  between  these  devices  and  their  analog  interfaces. 
Although  this  option  need  not  be  ruled  out/  it  should  be  noted 
that  both  the  A/D  converters  and  the  modulators  are  relatively 
simple  devices;  thus,  the  fact  that  redundancy  is  added  somewhat 
inefficiently  is  not  nearly  as  significant  as  it  would  be,  for 
example,  for  the  considerably  more  complex  demodulator  or  decoder 
processors . 

The  communication  link  between  the  WBSP's  bus  controller  and 
the  FTSC  could  utilize  either  one  of  the  FTSC's  direct  memory 
access  (DMA)  ports  or  one  of  its  serial  bus  device  interface  units 
(DIUs) . If  the  DMA  port  is  not  pre-empted  for  some  other  purpose 
and  if  the  FTSC  and  WBSP  are  to  be  deployed  in  reasonably  close 
proximity,  the  DMA  interface  is  preferable  since  it  enables  more 
rapid  communication  between  the  two  systems  and  hence  shortens  the 
time  needed  for  diagnosis  and  recovery.  Since  the  DIU  port  is 
fast  enough  for  normal  operation,  however,  the  only  penalty  in 
using  it  is  in  a somewhat  longer  recovery  period  following  a WBSP 
failure . 

The  only  fault  monitoring  currently  envisioned  for  the  WBSP 
are  the  bus  decoders  provided  at  each  bus  interface  and  the  "tail 
checking"  already  mentioned  in  the  FTSC.  Since  the  data  are 
highly  encoded,  it  is  felt  that  the  tail  checks  by  themselves  will 
provide  an  adequate  means  of  monitoring  the  health  of  the  system. 
The  main  function  of  the  bus  decoders  is  to  help  the  FTSC  isolate 
faults  involving  inter-module  communication.  If  additional  fault 
monitoring  is  found  to  be  desirable,  it  would  be  relatively  simple 
to  add  one  processor  on  each  bus  and  to  proqram  it  to  monitor,  on 
a time  shared  basis,  the  performance  of  each  of  the  other  monitors 
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on  the  same  bus.  (It  should  be  noted  that  the  monitoring  processor 
can  in  general  execute  considerably  simpler  algorithms  than  the 
processor  being  monitored  and  still  thoroughly  check  the  latter’s 
performance . ) 

The  bulk  of  the  processing  capability,  and  hence  of  the 
system  complexity,  in  the  WBSP  resides  in  the  three  sets  of 
processors  (demodulator  processors,  interleaver/de-interleaver 
processors,  and  encoder/decoder  processors)  shown  in  Figure  2-1. 

Each  of  these  three  processor  types  is  discussed  in  turn  in  the 
following  three  sections. 
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3.0  THE  CORE  PROCESSOR  ’> 

The  three  types  of  processors  used  in  the  WBSP  consist  of  a 
core  processor  (the  encoder/decoder  processor)  and  two  modifica- 
tions of  that  processor  (the  demodulator  and  interleaver/de-inter- 
leaver processors) . The  core  processor  is  discussed  in  this 
section  and  the  other  two  processors  in  the  following  two  sections. 
Paragraph  3.1  describes  the  architecture  of  the  core  processor. 

The  following  paragraphs  describe  four  different  signal  processing 
applications  for  such  a processor:  a Viterbi  decoder  metric 
processor;  a Viterbi  decoder  path  processor;  a dual-3  decoder  and 
a feedback  decoder.  The  specific  function  to  be  implemented  bv 
the  processor  is  determined  through  an  initialization  phase 
during  which  its  control  RAM  is  loaded  with  the  appropriate  micro- 
instructions. The  processor  then  accepts  information  received 
over  the  bus,  processes  that  information,  stores  the  result  in 
its  output  buffer  and  waits  for  a new  input  before  repeating  the 
same  sequence.  The  processor  is  interrupted  each  time  the  input 
buffer  receives  a new  word  of  information.  Since  the  entire 
system  uses  a common  clock,  this  interrupt  mechanism  is  sufficient 
to  keep  the  system  in  synchronism. 

3.1  GENERAL  DESCRIPTION 

The  core  processor  consists  of  an  interface  section,  a 
control  section  and  a processing  section.  The  block  diagram 
shown  in  Figure  3-1  shows  the  LSI  devices  which  comprise  these 
sections.  The  interface  section  of  the  core  processor 
consists  of  6 LSI  chips  of  three  types,  including  two  Control 
Interface  chips.  The  control  section  consists  of  the  I/O  Buffer 
and  Control  RAM  Sequencer  and  the  Control  RAM.  The  Control  RAM 
requires  256  40-bit  words,  and  is  implemented  using  ten  256-word 
by  4-bit  RAMs . The  processing  section  consists  Of  an  8-bit 
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Register  Array  and  Arithmetic  Logic  Unit  (RALU)  and  a data  storage 
RAM  with  a capacity  of  256  8-bit  words.  The  RALU  is  implemented 
with  two  4-bit  RALU  chips  and  the  storage  RAM  with  two  256-word 
by  4-bit  RAM  chips.  This  results  in  a core  processor  total 
chip  count  of  21  devices. 

The  three  types  of  interface  chips  and  the  RALU  are  identical 
to  those  being  developed  for  the  FTSC . The  1024-word  by  1-bit  RAM 
being  developed  by  the  FTSC  program  will  be  designed  so  that  it 
can  be  easily  changed  to  a 256-word  by  4-bit  configuration  for  the 
core.  The  4-bit  FTSC  RALU  will  be  directly  transf errable 
to  the  basic  processor  without  modification.  The  only  LSI  device 
to  be  developed  specifically  for  the  core  processor  is  the 
I/O  Buffer  and  Control  RAM  Sequencer. 

This  latter  device  provides  control  to  the  processor  and 
interfaces  the  processor  to  the  bus  system.  It  buffers  data  bytes 
for  transfer  between  the  processor  register  array  and  the  external 
bus  system.  When  commanded  to  load  the  microprogram  control  RAM 
(by  a hard-address  command) , the  I/O  Buffer  and  Control  RAM 
Sequencer  takes  over  control  of  the  processor.  It  sequences 
through  the  RAM  loading  8 bits  at  a time  from  data  routed  to  the 
input  buffer  by  the  bus  controller.  After  the  control  RAM  is 
loaded,  the  processor  is  switched  back  to  normal  (again  by  hard- 
address  command)  and  begins  processing  starting  at  address  zero 
of  the  control  RAM. 

An  assignment  of  the  control  ROM  him  in  lint  oil  in  T.iLIf  t I. 
The  block  diagram  shown  in  Figure  3-2  indicates  the  RALU  control 
assignment.  The  width  of  the  register  array  input  multiplexer 
field  has  been  reduced  from  the  6 bits  used  in  the  FTSC  to  2 bits, 
by  having  both  the  general  purpose  and  working  register  multi- 
plexers share  the  s ne  2 bits. 


13 


RF12 


Figure  3-2  WBSP  ALU  CONFIGURATION 
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Table  3-1 

CONTROL  RAM  OUTPUT  SIGNAL  ASSIGNMENTS 


Number 
of  Bits 


3 


8 

3 

3 


1 

2 


1 

3 

1 


Function 

Control  RAM  sequencer  branch  address 
(Up  to  7 different  branch  condition 
codes  may  be  specified) . 

Control  RAM  next  address 

Working  register  WX  output  select 

Working  register  WY  output  select  and 
General  Purpose  register  RA  output 
select 

Working  register  WX/WY  input  select 

Working  register  multiplexer  input 
select;  General  Purpose  register  multi 
plexer  input  select 

Working  register  write  clock 

General  purpose  register  RB  output 
select 

General  purpose  register  RA/RB  input 
select 


1 General  purpose  register  write  clock 

3 ALU  function  select 


3 ALU  A,  B multiplexor  input 

1 Carry  input  to  ALU 

3 I/O  Buffer  and  Memory  request  and  RAM 

address  modifier  (0,  32,  64) 


2 Special  function  decode 

1 Read/Wri te  Select 

1 Spare 
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Although  the  core  processor  architecture  is  generally 
conventional,  it  does  contain  two  somewhat  unusual  features  that 
are  extremely  useful  in  supplementing  certain  signal  processing 
algorithms:  (1)  Certain  of  the  ALU  output  bits  can  be  individually 

specified  as  control  RAM  branch  bits.  (Usually  only  sign,  carry 
out,  overflow  and  the  all-zeros  conditions  are  used  to  effect 
branches.)  (2)  The  microcode  can  specify  that  certain  of  the  bits 
of  the  data  RAM  address  generated  by  the  processor  be  complemented, 
thereby  in  effect,  allowing  the  microcode  to  modify  the  processor- 
generated base  address.  The  utility  of  these  features  will  become 
apparent  in  the  subsequent  discussion. 

3.1.1  I/O  BUFFER  AND  CONTROL  RAM  SEQUENCER 

As  already  noted,  the  core  processor  can  be  implemented, 
with  one  exception,  using  chips  currently  being  developed  for  the 
FTSC.  The  FTSC  chip  designs  have  all  been  documented  and  need 
not  be  described  in  detail  here.  The  one  exception,  the  T/O 
Buffer  and  Control  RAM  Sequencer  chip,  was  developed  specifically 
for  the  WBSP.  This  chip  is  described  in  the  following  paragraphs. 

3. 1.1.1  I/O  Pin  Designations 

Seventy-five  pins  are  needed  for  the  I/O  Buffer  and  Control 
RAM  Sequencer.  The  assignments  for  these  pins  are  listed  in 
Table  3-2. 

3. 1.1. 2 General  LSI  Specifications 

The  requirements  imposed  on  the  I/O  Buffer  and  Control  RAM 
Sequencer  are  as  follows: 

1)  Facilitate  loading  the  control  RAM  from  the  system 
bus,  independent  of  RALU  and  scratchpad  RAM. 

2)  Provide  buffer  registers  for  the  data  and  address 
buses . 
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Table  3-2 


I/O  BUFFER  AND  CONTROL  RAM  SEQUENCER  PIN  DESIGNATIONS 


Quantity 


Label 


Address  Bus 
Control  Bus 


Data  Bus  (Interface) 

Control  RAM 
Enables 


Control  RAM 
Address 

Branch  Variables 


Mnemonic 

MABO-7 

REQ 

ACK 

R/W 

MYADD 

MXACK 

MTIYHAl 

IMODEN 

MCLK 

MODCLR 

GDEDCA 

GDEDCD 

MDBO-7 

CENO-4 
Write  en. 
ADD  STB 
PROCEN 

ROADDO-7 


Zero 

BCO 

BO 

BO  1 

B5 

B6 

Interrupt 


Microcode 

Data  Bus  (Processor) 
RAM  Address  In 
RAM  Address  Out 


Power 


RFl , RF2 (Bits  6,7) 


PDBO-7 


RAADDI0-4 


RAADDOO-4 
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3)  Provide  microprogram  control  logic  necessary  to 
operate  the  processor. 

4)  Provide  the  interface  between  the  bus  interface 
chips  and  the  processor  itself. 

5)  Coordinate  interprocessor  communication. 

6)  Provide  RAM  scratchpad  address  modifications. 

3. 1.1. 3 Functional  Block  Descriptions 

The  following  is  a brief  description  of  each  of  the  func- 
tional blocks  shown  in  the  1/0  Buffer  and  Control  RAM  Sequencer 
block  diagram  (Figure  3-3) : 


1)  Input  Data  Buffer 


2)  Output  Data  Buffer 


3) 

Control 

Logic 

4) 

Address 

Buffer 

5)  Hard/Soft  Instruc- 
tion Decode 


Provides  1 word  of  data  storage  (8 
bits) . This  buffer  is  used  for 
information  transfer  *rom  the  system 
bus  to  the  processor.  It  is  used  to 
load  the  control  RAM  and  serves  as  an 
input  register  for  the  RALU  and 
scratchpad  memory.  When  data  are 
entered  into  the  buffer,  a flag  is 
set  to  interrupt  the  processor. 

Provides  1 word  of  data  storage  (8 
bits).  This  buffer's  purpose  is  to 
hold  data  that  are  beina  transferred 
from  the  RALU  to  the  system  bus. 

This  block  controls  the  distribution 
of  the  processor's  control  signals. 

Provides  2 words  (16  bits)  of 
address  storage.  This  information  is 
provided  by  the  address  interface 
connected  to  the  system  bus.  The 
control  logic  provides  the  signals 
necessary  to  operate  the  buffer. 

This  block  detects  the  module's  hard 
and  soft  addresses.  Once  it  has 
detected  its  hard  address,  it  starts 
the  control  RAM  address  and  frame 
counter.  Information  for  this  block 
is  provided  by  the  address  buffer. 
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6)  Control  RAM  Address  & 
Frame  Counter 


7)  Control  RAM  Address 
Multiplexer 


8)  Microprogram  Control 
Logic 


9)  RAM  Scratchpad 
Address  Modifier 


o I 


Once  activated,  this  block  sequences 
from  address  0 through  address  2*35 
for  each  of  the  frames  0-4  of  the 
Control  RAM.  This  block  provides  the 
proper  control  signals  so  that  infor- 
mation can  be  transferred  from  the 
input  buffer  to  the  control  RAM. 

This  block  selects  the  source  of  the 
control  RAM  address:  the  control  RAM 
and  frame  counter  or  the  microprogram 
control  logic  block.  Only  when  the 
address  frame  counter  is  active  does 
the  multiplexer  allow  that  address 
information  to  be  put  on  the  bus . In 
all  other  cases,  the  microprogram 
control  logic  controls  the  address 
bus . 

This  block  creates  the  next  control 
RAM  address.  Information  to  this 
block  is  provided  by  selected  bits  of 
the  microinstruction.  These  bits 
along  with  those  from  the  branch 
variables  are  gated  through  the  control 
RAM  address  multiplexer. 

This  block  receives  selected  address 
bits  from  the  RALU  and  modifies  the 
address  in  accordance  with  the  micro- 
code. The  modified  address  is  used 
to  select  the  next  RAM  scratchpad 
word.  Three  conditions  are  allowed: 

Address  + 0 

Address  + 32 

Address  + 64 


3.2  CORE  PROCESSOR  MICROCODE  FIELD  DEFINITIONS 

Figure  3-4  shows  the  core  processor  microcode  fields  as  they 
appear  at  the  control  PA”  output.  Pome  or  the  microcode 
have  been  combined  to  yield  a more  efficient  microcode.  Table 
3-3  provides  a detailed  description  of  the  microcode  fields. 
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Table  3-3 

MICROCODE  FIELD  DEFINITIONS 
CONTROL  ROM  BRANCH  CONDITIONS 


ROM 

ROM 

Sequencer 

Field 

Address 

RFl 

ROM  Address 

5 e 

> 

7 

0 0 0 

RON 

1A5  ROMA  6 

ROMA  7 

0 0 1 

SUM6 

SUM  7 

0 10 

SUM5 

SUM6 

Oil 

RON 

1A6 

SUMO 

10  0 

SUM1 

10  1 

CARRY  OUT 

— 

110 

SUM  ZERO 

111 

V 

' > 

/ 

INTERRUPT 

- 

CONTROL  ROM  ADDRF.EE  FTELD 


ROM  Field-  RF2 


II 

I 

W 

i 


These  bits  are  used  directly  to  create  part  of  the 
next  ROM  Address. 


These  bits  specify  the  rest  of  the  next  ROM  address 
which  is  dependant  on  the  ROM  branch  conditions. 


22 


RAYTHEON 

EQUIPMENT 


COM  PA  N Y 

DIVISION 


Table  34-3  (Cont'd.) 
WORKING  REGISTER  WX  OUTPUT  SELECT 


ROM  Field-  RF3 
0 0 0 
0 0 1 
0 10 
Oil 
10  0 
10  1 
110 
111 


Register  Select 
Working  Register  0 
Working  Register  1 
Working  Register  2 
Working  Register  3 
Working  Register  4 
Working  Register  5 
Working  Register  6 
Working  Register  7 


Assignment 

WR0 

WRl 

WR2 

WR3 

Extension  Register 
Memory  Data 
Memory  Address 
Program  Counter 
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TABLE  3-3  (Cont'd.) 


WORKING  REGISTER  WX/WY  INPUT  SELECT 


ROM  Field-  RF5 


0 

1 


Assignment 

wx. 

WY 


GENERAL  PURPOSE  REGISTER  RA/RB  INPUT  SELECT 


ROM  Field-  RF9 


0 

1 


Assignment 

RB 

RA 


24 


r 


RAYTHEON  COMPANY  ^aYTHEOn' 

EQUIPMENT  division 


Table  3-3  (Cont'd.) 


WORKING  REGISTER  WY  OUTPUT  SELECT  AND  GENERAL 


PURPOSE  REGISTER  RA  OUTPUT  SELECT 


ROM  Field-  RF 4 


Register  Select 


Assignment 


0 

0 

0 

Register 

Select 

0 

WYO 

+ 

RO 

0 

0 

1 

Register 

Select 

1 

WY1 

+ 

Rl 

— 

0 

1 

0 

Register 

Select 

2 

WY2 

+ 

R2 

0 

1 

1 

Register 

Select 

3 

WY3 

+ 

R3 

1 

0 

0 

Register 

Select 

4 

WY  4 

+ 

R4 

1 

0 

1 

Register 

Select 

5 

WY5 

+ 

R5 

1 

1 

0 

Register 

Select 

6 

WY6 

+ 

R6 

1 

1 

1 

Register 

Select 

7 

WY7 

+ 

R7 
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Table  3-3  (Cont'd.) 


RAYTHEON. 


GENERAL  PURPOSE  REGISTER  ARRAY  INPUT  MULTIPLEXING 


[ Field-  RF6 

Register  Input 

End  Bits  Input 

- 

0 0 

Working  Register  Y 

NONE 

— 

0 1 

SUM  Bus 

NONE 

1 0 

SUM  Bus  Right 

Shifted  by  one- 

Determined  by  end 
condition  bits 

1 1 

SUM  Bus  Left 

Shifted  by  one 

Determined  by  end 
condition  bits 

WORKING  REGISTER  ARRAY  INPUT  MULTIPLEXING 


ROM  Field-  RF6 


0 0 


Register  Input 


SUM  Bus 

Working  Register  Y 

Working  Register  Y 
Right  Shifted  by  one 

Working  Register  Y 
Left  Shifted  by  one 


End  Bits  Input 


NONE 

NONE 

Determined  by  end 
condition  bits 

Determined  by  end 
condition  bits 
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Table  3-3  (Cont’d.) 


GENERAL  PURPOSE  REGISTER  RB  OUTPUT  SELECT 


ROM  Field-  RFR 


0 0 0 
0 0 1 
0 10 
0 11 
10  0 
10  1 
110 
111 


Register  Select 
General  Purpose  Reg  0 
General  Purpose  Reg  1 
General  Purpose  Reg  2 
General  Purpose  Reg  3 
General  Purpose  Reg  4 
General  Purpose  Reg  5 
General  Purpose  Reg  6 
General  Purpose  Reg  7 


Assignment 

R0 

R1 

R2 

R3 

R4 

R5 

R6 

R7 
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Table  3-3  (Cont'd) 
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ROM  Field- 


SO  SI  S2 


B «=  0 
0 0 0 

0 0 1 

0 10 
Oil 

10  0 
10  1 
110 
111 


ALU  FUNCTIONS 


CARRY  IN 


CIN  = 


AB 

AB 

. A + B 

A + B 

A'B 

A'B 

A @ B 

A © B 

A plus  B 

A plus  B plus  1 

A minus  B minus  1 

A minus  B 

B minus  A minus  1 

B minus  A 

B (l's  complement) 

-B(2's  complement) 

0 ' s 

0 ' s 

A 

A 

A (l's  complement) 

A (l's  complement) 

A 

A 

A 

A plus  1 

A minus  1 

A 

A (l's  complement) 

-A  (2's  complement) 

l's 

. 0's 

1 
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A 

B 

C 


ALU  INPUT 

MULTIPLEXING 

— 

ROM 

Field-  RF12 

Assiqnment 

A 

B 

C 

ALUA 

ALUB 

0 

0 

0 

GPRB 

WRX 

0 

0 

1 

GPRA 

WRX 

0 

1 

0 

GPRB 

WRXLS 

0 

1 

1 

WRX 

O' s/2 

1 

0 

0 

GPRB 

GPRA 

— 

1 

0 

1 

WRX 

GPRA 

1 

1 

0 

GPRB 

O' s/2 

1 

1 

1 

GPRA 

0 's/2 

— 

ALU  B LEG  SELECT 


ALU  A LEG  SELECT 
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Table  3-3  (Cont'd.) 
CARRY  INPUT  TO  ALU 


RAYTHEON. 


ROM  Field-RF13 


Assignment 


No  Carry 
Carry 


I/O  BUFFER  & MEMORY  REQUEST 


ROM  Field-RF14 


Assignment 


Address  Modifier  Disable 

(RAM  ADDRESS  Unmodified) 

Increment  by  32 

Increment  by  64 

Undefined 

Input  Request 

Output  Request 

Undefined 

ALU  Function  (Default  Value) 


SPECIAL  FUNCTION  DECODE 


ROM  Field-RF15 


Assignment 
No  Operation 

Reset  Overflow  Flip  Flop 
Set  Overflow  Flip  Flop 
Enable  Carry  Flip  Flop 


ROM  Field-RF16 


READ/WRITE  SELECT 


Assignment 


Write 


Read 


ROM  Field-RF17 


Assignment 

Undefined 

Undefined 
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3 . 3 CORE  PROCESSOR  UTILIZATION  AS  A DECODER 

The  core  processor  is  used  without  modification  to  implement 
the  Encoder/Decoder  processor  in  the  WBSP . In  order  to  assess  its 
suitability  for  this  purpose,  the  entire  microcode  sequence  was 
defined  for  three  different  decoding  algorithms:  the  Viterbi 
decoding  algorithm  for  a constraint-length  7,  rate  1/2  convolutional 
code;  the  dual-3  Viterbi  decoding  algorithm?  and  a constraint-length 
10,  rate  1/2  convolutional  code  feedback  decoding  algorithm.  The 
constraint-length  7 Viterbi  decoding  algorithm  was  selected  because 
it  was  judged  to  be  the  most  complex  algorithm  that  might  be 
implemented  on  board  a satellite.  It  now  appears  that  the  dual-3 
decoding  algorithm  and  either  the  feedback  decoding  algorithm  or 
a Reed-Solomon  code  decoding  algorithm  of  comparable  complexity 
will  actually  be  implemented.  Nevertheless,  the  following  para- 
graphs demonstrate  the  versatility  of  the  proposed  means  of 
implementing  these  functions.  Since,  in  particular,  the  constraint- 
length  7 Viterbi  decoding  algorithm  can  be  handled  at  the  rates  of 
concern  here  using  the  proposed  approach,  it  can  be  concluded  that 
any  decoding  requirement  placed  on  the  WB.SP  can  be  similarly 
accommodated. 

3.3.1  THE  VITERBI  DECODER  METRIC  PROCESSOR 

Two  decoder  processors  are  used  to  implement  the  constraint- 
length  7 Viterbi  decoding  algorithm.  The  metric  processor  is 
described  first;  the  path  processor  is  described  in  paragraph 
3.2.2.  It  is  assumed  that  soft-decision  (3-bit  quantized)  data  is 
deposited  in  the  processor's  input  buffer.  The  data  representing 
the  first  decision  associated  with  a given  branch  is  in  bit  positions 
1-3,  and  those  representing  the  second  decision  are  in  bit  positions 
5-7  of  the  8-bit  input  word.  The  result  of  each  metric  processor 
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decision  is  stored  in  the  output  buffer.  These  decision  bits  are 
packed  eight  to  a word  to  be  transferred  to  the  path  processor. 

This  output  need  only  be  fetched  by  the  bus  controller  eight  times 
during  each  information  bit  period,  in  order  to  preserve  all 
sixty-four  decisions. 

The  utilization  of  the  RALU  registers  is  summarized  in  Table 
3-4.  (The  eight  general-purpose  registers  are  denoted  RA-R_ , 
and  the  four  working  registers  W^-W^;  E,  A,  P»  and  M denote, 
respectively,  the  extension,  address,  program  counter,  and  memory 
data  registers,  and  I denotes  the  interface  buffer  register.)  The 
storage  memory  is  organized  as  follows:  the  first  and  third  64- 
word  quadrants  provide  two  complete  metric  memories.  One  memory 
is  read  from  and  the  other  written  to  during  one  decoding  cycle  and 
the  two  roles  reversed  during  the  following  cycle.  The  second 
quadrants  are  used  to  store  the  two-bit  patterns  associated  with 
each  of  the  sixtv-four  branches.  Since  unused  memorv  is  available, 
this  information  is  duplicated  in  the  fourth  quadrant.  This 
duplication  allows  the  appropriate  branch  information  to  be 
fetched  from  the  same  relative  address  regardless  of  which  of  the 
two  metric  memories  is  being  used. 

The  microinstructions  needed  to  determine  the  sixtv-four  path 
decisions  and  to  update  and  normalize  the  path  metrics  following 
each  new  received  branch  are  itemized  in  Figure  3-5.  The  meaning 
of  the  abbreviations  used  there  are  defined  in  Table  3-5.  A total 
of  1234  clock  cycles  (with  two  cycles  allowed  for  each  memory 
access)  are  required  for  each  information  bit.  Present  estimates 
are  that  the  processor  can  operate  using  a 150  nsec,  clock.  If 
this  is  the  case,  a single  metric  processor  could  decode  informa- 
tion at  a rate  of  roughly  5400  bps. 
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Figure  3-5 


METRIC  PROCESSOR  MICROROUTINE 


START  (FOLLOWING  INTERRUPT) 


I R5 
R5"R6 

R6  ’ R3  ^ R6 
SRS,  R5 

SRS,  R5 

SRS,  R5 

SRS,  R5 

R5+R6"W0 
W0  + r4  -*■  WQ 


10  h>w3  + r5-r5 

W2  + R6  " R6~ 
n ->w,  + R,  *re 


W0  + R6  - V 

R6  - R5-M« 


SLS,  R0 
-BO,  R5 


SRS,  W2 
W2  “ Rg  ^W2 
W2  +R5-w2 
W2  + R4  - w2 
R2  ~ WQ  + Wx 
R2  " M2  '*W3 


Rg  (A  + 32) 
E ♦ (A  + 64) 
B67,  E 


wQ  + r5-r5 

wi  + r*  -v 
’ w2  + r5  *r5 
W3  + r6  -r6 


SLSC,  Rq 
-BO,  Rg. 


>R4  ' R1 
STZ,  R4 

INC,  P 


CMP,  E 

REPEAT  BOXED- IN 
PART 


INC,  A 
-B01,  P 
> STZ,  P 


STZ,  A 


WAIT  FOR  INTERRUPT 
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TABLE  3-5 


ABBREVIATION 


R -*  (S) 


R «-  (S) 


R + S -*C 


R • S -» Q 


R V S -•  Q 


R • S -►  Q 


R - S -*  Q 


Bij'  R 


SLSC,  R 


MICROCODE  DEFINITIONS 


DEFINITION 

Store  contents  of  Register  R into  Register  S. 

Store  contents  of  Register  R into  memory  location  specified  in 
Register  S. 

Load  Register  R with  the  contents  of  the  memory  location 
specified  in  Register  S. 

Add  (2‘s  comp.)  the  contents  of  registers  R and  S and  store  the 
result  in  Register  Q. 

Store  in  Register  Q the  logical  "and"  of  the  contents  of 
registers  R and  S. 

Store  in  Register  Q the  logical  inclusive  "or"  of  contents  of 
registers  R and  S. 

Store  in  Register  Q the  logical  exclusive  "or"  of  contents  of 
registers  R and  S. 

Subtract  (2’s  complanent)  contents  of  Register  S from  Register  R 
and  store  result  in  Register  Q. 

Brarch  on  the  ALU's  carry-out  bit. 

Branch  on  bits  i,  j in  Register  R. 

Branch  on  bit  j in  Register  R. 

Replace  contents  of  Register  R by  its  one's  complement. 

Decrement  (by  one)  the  contents  of  Register  R. 

Increment  (by  one)  the  contents  of  Register  R. 

Jump  unless  Register  R contains  only  zeros. 

Jump  if  Register  R contains  only  zero. 

Cyclically  rotate  contents  of  Register  R one  position  to  the  left. 

Shift  contents  of  Register  R one  position  left;  shift  zero  into 
least  significant  (right-most)  bit  position. 

Cyclically  rotate  contents  of  Register  R one  position  to  the  right. 

Shift  contents  of  Register  R one  position  to  the  right;  shift  a 
zero  into  most  significant  (left-most)  bit  position. 

Shift  contents  of  Register  R one  position  to  the  left  and  shift 
a one  into  the  least  significant  bit  position. 

Store  all  zeros  into  Register  R. 
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3.3.2  VITERBI  DECODER  PATH  PROCESSOR 

The  path  processor  random-access  memory  is  functionally 
divided  into  sixty-four  segments.  Each  segment,  consisting  of 
four  consecutive  addresses,  is  used  to  store  the  thirty-two  bits 
representing  one  of  the  sixty-four  contending  paths.  Double 
storage  of  this  information  is  obviated  by  dynamically  reassigning 
path  addresses.  Initially,  for  example,  path  0 is  stored  in 
locations  0,  1,  2 and  3 and  path  32  is  stored  in  locations  128, 
129,  130  and  131.  The  first  decision  made  by  the  metric  processor 
determines  whether  the  new  path  0 is  to  be  defined  by  the  old 
path  0 (augmented  by  0)  or  by  the  old  path  32  (augmented  by  1) . 
Similarly,  the  second  decision  determines  whether  the  new  path  1 
is  to  be  defined  by  the  old  path  0 or  by  old  path  32.  Once  the 
results  of  the  first  two  decisions  are  available,  the  old  paths 
0 and  32  are  no  longer  needed.  Thus,  new  path  1 can  be  stored 
in  locations  128,  129,  130  and  131. 

Since  the  path  addresses  are  continually  changing,  it  is 
obviously  necessary  to  keep  a record  of  the  current  assignments. 
This  is  done  quite  simply  by  defining  two  pointers  (pointers  2 and 
3 in  Table  3-4) , one  of  which  is  used  to  increment  the  address  to 
proceed  from  one  path  to  the  next  and  the  second  to  complement 
one  of  the  bits  in  the  address  in  order  to  generate  the  address 
of  the  competing  path.  These  pointers  are  then  shifted  one  > 
position  to  the  right  (modulo  6)  following  each  pass  through  the 
entire  path  memory.  Initially,  then,  pointer  2 is  set  to  100000 
and  pointer  3 to  000001.  (The  two  least  significant  bits  are 
irrelevant  to  this  discussion  and  are  hence  ignored.)  The  base 
address  A (initially  000000)  is  added  to  pointer  2 in  order  to 
define  the  address  of  the  competing  path.  (Addition  here  is 
defined  as  addition,  modulo  63,  with  the  result  interpreted  as  an 


1 


38 


RAYTHEON  COMPANY 

EQUIPMENT  DIVISION 

integer  in  the  interval  (1,  63).)  The  next  base  address 
(designating  the  location  of  one  of  the  two  paths  involved  in  the 
next  two  decisions)  is  generated  by  adding  (modulo  63)  pointer  3 
to  the  current  base  address,  etc.  (Thus,  initially,  the  base 
addresses  are  generated  in  the  sequence  0,  1,  2,  ...,  and  the 
competing  addresses  in  the  sequence  32,  33,  34,  ...).  After  each 
path  through  the  entire  memory  (producing  one  decoded  bit) , both 
pointers  are  rotated  one  position  to  the  right,  modulo  6.  Thus, 
pointer  2 is  set  to  010000  and  pointer  3 to  100000  during  the 
second  pass  (producing  the  base  address  sequence  0,  32,  1,  33,  2, 

34,  ...»  and  the  competing  address  sequence  16,  48,  17,  49,  18, 

50,  ...).  It  is  not  difficult  to  verify  that  this  address-generat- 
ing algorithm  does  indeed  produce  the  proper  base  sequence  and  the 
relationship  between  the  base  competing  addresses . Since  no 
location  is  stored  into  until  the  previous  contents  of  that 
location  are  no  longer  of  interest,  double  storage  is  not  necessary. 
Moreover,  the  two  address  sequences  are  easily  generated  using 
this  simple  algorithm. 

A third  pointer  (pointer  1 in  Table  3-4)  and  a modulo-4 
counter  are  used  to  keep  track  of  the  location  of  the  oldest  bit 
in  each  path.  The  pointer  is  rotated  one  position  to  the  right 
(modulo  8)  following  each  pass  through  the  memory;  the  counter  is 
augmented  by  one  after  each  eight  pointer  rotations.  The  counter 
thus  indicates  which  of  the  four  memory  locations  corresponding  to 
any  given  path  contains  the  oldest  bit  and  the  pointer  identifies 
the  position  of  the  oldest  bit  in  that  location.  Allowing  the 
location  of  the  oldest  bit  to  change  in  this  manner  eliminates  the 
need  to  shift  each  32-bit  path  during  each  pass  through  the 
memory.  Since  it  is  obviously  much  easier  to  rotate  an  eight-bit 
pointer  and  to  increment  a modulo-four  counter  than  to  shift  64 
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32-bit  paths,  this  procedure  significantly  reduces  the  amount  of 
time  needed  to  decode  each  bit. 

The  microinstructions  needed  to  manipulate  the  path  memory 
and  to  determine  the  decoded  output  (using  a majority  vote  on  the 
oldest  bit  in  each  path)  are  listed  in  Figure  3-6.  (The  abbrevia- 
tions used  there  are  those  defined  in  Table  3-5.)  The  sixty-four 
metric  processor  decisions  corresponding  to  one  information  bit 
are  presented  to  the  path  processor  in  eight  eight-bit  words 
(cf , Section  2) . The  decoded  output  bits  are  packed  eight  to  a 
word.  If  two  150  nanosecond  clock  cycles  are  required  for  each 
memory  access  and  one  clock  cycle  for  all  other  microinstructions, 
a maximum  of  1717  clock  cycles  are  needed  to  decode  an  information 
bit.  Thus,  the  path  processor  can  operate  at  information  rates  of 
up  to  3800  bps. 

3.3.3  DUAL- 3 VITERBI  DECODER  IMPLEMENTATION 

The  dual-3  Viterbi  decoding  algorithm,  as  implemented  by  a 
decoder  processor,  can  be  conveniently  divided  into  three  phases. 
During  phase  1,  the  eight  3-bit  soft-decision  characters  represent- 
ing the  first  symbol  of  the  branch  being  processed  are  received 
and  stored  in  registers  Rg  through  R^ . (For  purposes  of  this 
discussion,  it  is  assumed  that  these  characters  are  transferred  as 
8 data  words.)  Each  of  the  8 path  metrics  is  then  added  to  each 
of  these  8 "delta-metrics"  to  form  64  partially  augmented  metrics, 
corresponding  to  the  64  extended  paths,  which  are  stored  in  the 
data  RAM  for  subsequent  processing. 

The  second  phase  (phase  2A)  starts  when  the  eight  3-bit 
characters  representing  the  second  branch  symbol  are  received  and 
again  stored  in  registers  Rg  through  These  second  delta- 

metrics  are  similarly  added  to  the  64  partially  augmented  metrics 
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already  stored  to  form  the  64  fully  augmented  metrics  correspond- 
ing to  the  64  contending  paths.  The  order, in  which  this  second 
set  of  delta-metrics  are  used  is  determined  by  the  contents  of 
the  "address  modifier"  section  of  the  data  memory.  (The  data 
memory  partitioning  is  shown  in  Table  3-6.) 

The  third  decoding  phase  (phase  2B)  begins  as  soon  as  all  64 
augmented  metrics  have  been  formed.  During  this  phase,  the  smallest 
of  the  i^1,  i + 8^,  i + 16^,  . ..,  i + 64fc^  augmented  metrics  is 
determined  for  each  i (i  = 1,  2,  ...»  8)  and  used  to  define  the  8 
new  metrics  and  the  8 new  paths.  At  the  same  time,  the  oldest 
symbol  is  extracted  from  the  path  represented  by  the  minimum 
metric  and  stored  in  register  R^.  Any  necessary  metric  normaliza- 
tion is  carried  on  at  this  time. 

The  microroutine  used  to  implement  this  algorithm  is  shown  in 
Figure  3-7.  The  RALU  register  utilization  during  each  of  the 
three  decoding  phases  is  presented  in  Table  3-7.  If  the  processor 
is  clocked  at  a 150  nsec,  rate,  it  can  decode  dual-3  encoded  in- 
formation at  a 10,500  bps  rate. 

3.3.4  FEEDBACK  DECODER  IMPLEMENTATION 

The  feedback  decoder  implements  the  algorithm  indicated  in 
Figure  3-8.  After  each  new  pair  of  code  bits  are  received  (con- 
sisting of  one  information  bit  and  one  parity  bit) , the  processor 
determines  a new  11-bit  syndrome.  The  8 most  significant  bits  of 
this  syndrome  define  an  address  in  memory  and  the  3 least  signifi- 
cant bits  identify  which  of  the  8 bits  in  this  stored  word  is  to 
be  used  as  the  output  correction  bit.  The  microroutine  used  to 
implement  this  algorithm  is  shown  in  Figure  3-9.  The  RALU  register 
utilization  is  described  in  Table  3-8.  (The  information  and 
syndrome  bit  numbers  shown  there  correspond  to  the  numbers  indicated 
in  Figure  3-8.)  The  processor  can  implement  this  algorithm  at 

information  rates  of  up  to  211,000  bits/second. 
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Table  3-6 

DUAL-3  DECODER  MEMORY  PARTITIONING 


3 0 0 0 0 0 0 0 


Path  0 ( symbols  1 and  2) 
Path  1 (symbols  1 and  2) 


Pa  i i,  Memory  A 


Path  7 (symbols  1 and  2) 
Path  0 (symbols  3 and  4) 
Path  1 (symbols  3 and  4) 


00111111 


Path  7 (symbols  15  and  16) 


01000000 

Metric  Memory 
01000111 


Metric  0 


I : 


Metric  7 


01001000 
Phase  2A 

Address  Modifiers 
01001111 


01010000 
Phase  2B 

Variables 

01010011 

01010100 

01011111 

10000000 
Path  Memory  B 

11000000 
Contending  Metrics 

11111111 


00000000 

00000101 

00000001 

00000100 

00000010 

00000111 

00000011 

00000110 


Normalizer 

Path  Memory  A/B  Pointer 
Symbol  Mask 
1.  1000000 


/ rdt 

} Sym 

( ii 


Unused 


Same  organization  as  Path  Memory  A 


Mo  ♦ AM0 
mq  + Am 


«0  * AM7 

M1  t AM0 


M?  + Am? 
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TABLE  3-7 

DUAL- 3 DECODER  REGISTER  USAGE 


REGISTER 

PHASE  1 

PHASE  2A 

PHASE  2B 

Ro 

AM00 

iM01 

Scratch  Pad 

m 

iM10 

*«51 

Index  of  current  best  metric 

r2 

iM20 

dMu 

Minimum  metric  found  in  this 
sequence  (initialized  to  11111111) 

r3 

•*”30 

iM41 

Metric  pointer  (initialized  to  W3) 

r4 

-M40 

3M21 

Normal izer 

R5 

iM50 

am7i 

Path  Memory  A/B  Pointer  (alternates 
between  00000000  and  10000000) 

R6 

AM60 

am3i 

Symbol  Mask  (alternates  between 
00000111  and  00111000) 

R7 

iM70 

iM61 

DECODED  SYMBOL 

W0 

01000000 

01000000 

01000000 

W1 

Base  address  for  con- 
t end  i nq  metrics  (ini- 
tialized t , HUOOOOO) 

Base  address  for  contend- 
ing metrics  (initialized 
to  1 lOOOOOO) 

Current  metric  pointer  (used 
instead  of  R.  • t »d*  eyries) 

Index  used  during  met  1 u oinp.wi- 

sons 

w., 

Index  for  contending 
metrics  (initialized  to 
00000000) 

Index  for  contending 
metric's  (loaded  from 
memory) 

W3 

00001000 

00001000 

00001000 

A 

Address  Register  used 
to  load  present  metrics 

Address  Register  (used  to 
load  W2) 

Address  Register  (used  to  load 
contending  metrics) 

P 

Address  Register  (used 
update  metrics) 

Address  Register  (used  to 
update  metrics) 

Address  Register  (used  to  update 
path  and  metric  memories) 

M 

Scratch  Pad 

Scratch  Pad 

Scratch  Pad 

E 

Scratch  Pad 

Not  Used 

Scratch  Pad 
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TABLE  3-8 
FEEDBACK  DECODER  REGISTER  USAGE 

Rq  = Information  bits  being  corrected  (4  to  -3) 

R^  = Most  recent  information  bits  (15  to  8) 

R2  = Next  most  recent  information  bits  (7  to  0) 

R^  = Most  recent  (modified)  syndrome  bits  (15  to  8) 

R4  = Next  most  recent  (modified)  syndrome  bits  (7  to  0) 

R5  *■  Third  most  recent  (modified)  syndrome  bits  (-1  to  -8) 

Rg  = Correction  bits:  [(R5)  determines  which  of  the  8 bits  is  to  be  used] 
R 7 = Counter 

A = Memory  address  register  [Loads  Rg  from  (A) ] 

M = 00001000 

Syndrome  memory  organization: 

8 most  recent  (modified)  syndrome  bits  -+  address 
3 oldest  (modified)  syndrome  bits  bit 

to  be  stored  in  given  position  at  that  address: 


Oldest 

Next 

Oldest 

Third 

Oldest 

Use 

Bit 

0 

0 

0 

7 

0 

0 

1 

6 

0 

1 

0 

5 

0 

1 

1 

4 

1 

0 

0 

3 

1 

0 

1 

2 

1 

1 

0 

1 

1 

1 

1 

0 
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Phase  1 

Wait  for 

I ♦ R 
J 0 

Wait  tor 


ri  - R1 


Interrupt 


Interrupt 


Wait  for  Interrupt 


I ■ R_ 

r 2 

Wait  for  Interrupt 

I . “*  R 
l 3 

Wait  for  Interrupt 

I.  -*  R- 

x 4 

Wait  for 


Interrupt 


I . -*  R 
l 5 


Wait  for  Interrupt 


I.  -■*  R- 

x 6 

Wait  for  Interrupt 


R7  - X0 

Xi  -*R7 
STZ  W, 


SRSC  W, 


SRSC  W, 


STZ  W„ 


“o  ^ A 
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3.3.5  DECODER  PROCESSOR  SUMMARY 

The  rates  at  which  the  decoder  processors  can  implement  each 
of  the  algorithms  described  in  the  preceeding  paragraphs  are 
summarized  in  Table  3-9.  It  can  be  concluded  from  these  investiga- 
tions that  the  code  processor  is  capable  of  implementing  any 
decoding  algorithm  likely  to  be  required  for  the  SSS  satellite  at 
rates  well  in  excess  of  those  rates  actually  anticipated. 
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INTERLEAVER/DE -INTERLEAVER  PROCESSOR 
4.1  GENERAL  DESCRIPTION 

The  Interleaver/De-inter leaver  processor  consists  of  a core 
processor  (cf,  para.  3.1)  augmented  with  an  interleaving  memory 
consisting  of  3096  25-bit  words  (24  information  bits  plus  1 parity 
bit) . The  major  variable  in  this  processor  design  was  in  the 
memory  size.  The  core  processor  can  easily  handle  the  computa- 
tional loads  anticipated  for  it.  Any  appreciably  simplified  version 
of  this  processor  (e.g.,  making  it  four  bits  rather  than  eight  bits 
wide)  would  sever ly  reduce  its  computational  margin,  however, 
without  significantly  improving  its  reliability.  The  memory  size 
specified  here  was  arrived  at  based  on  reliability  and  functional 
partitioning  considerations  (cf , Section  6) . 

Since  the  long-term  failure  probability  of  the  interleaving 
memory  is  not  small  compared  to  that  of  the  core  processor  itself, 
the  Interleaver/De-interleaver  processor  reliability  can  be  signif- 
icantly improved  by  making  this  memory  fault-tolerant.  This  can 
be  done  by  adding  three  spare  bits  to  each  word-  (making  the  memory 
28  bits  wide)  and  using  the  bit  rippler  being  developed  for  the 
FTSC  to  replace  failed  bits  with  these  spares.  This  small  added 
redundancy  effectively  reduces  the  hazard  rate  of  the  memory  by  a 
factor  of  36;  i.e.,  from  that  associated  with  the  72  1024  x 1-bit 
chips  needed  to  implement  a non-redundant  memory  to  that  due 
solely  to  the  two  identical  devices  needed  to  implement  the  rippler 
switch.  This  is  because  the  probability  of  more  than  three  bit 
failures  in  a 7-year  mission  is  negligible  compared  to  the 
probability  of  a failure  in  the  rippler  itself. 

Since  the  information  is  stored  in  memory  in  25-bit  words  and 
since  there  are  3096  such  words,  logic  is  needed  to  buffer  the 
information  in  8-bit  bytes  and  to  extend  the  basic  processor's 


8-bit  address  field.  In  addition,  the  Interleaver/De-inter leaver 
processor  is  designed  with  a single-buffer  memory.  Thus,  memory 
fetch  and  store  addresses  must  be  constantly  shuffled  so  that  no 
attempts  are  ever  made  to  write  information  into  a memory  location 
before  the  information  previously  stored  in  that  location  has  been 
read.  The  logic  needed  to  accomplish  this  buffering  and  address 
manipulation  can  all  be  integrated  onto  a single  CMOS/SOS  LSI 
device.  This  device,  the  only  special  development  needed  specifi- 
cally for  the  Interleaver/De-interleaver  processor,  is  discussed 
in  detail  in  para.  4.2. 

It  is,  of  course,  possible  to  avoid  the  need  for  the  address 
shuffling  operation  mentioned  in  the  previous  paragraph  bv  doubling 
the  size  of  the  memory;  half  the  memory  could  then  be  used  for 
storing  new  information  and  the  second  half  for  fetching  previously 
stored  information,  with  the  two  halves  changing  roles  each  frame. 
The  bit  rippling  scheme  discussed  earlier  could  be  used  to  keep  the 
reliability  penalty  of  the  doubled  memory  nearly  as  small  as  that 
of  the  single  memory.  The  major  penalty  of  doubling  the  memory, 
therefore,  is  in  the  increased  weight  and  volume  of  the  resulting 
WBSP . Since  the  Space  Shuttle  is  to  be  used  to  launch  the  SSS, 
this  penalty  may  not  be  that  significant.  If  a special  LSI  device 
is  developed  to  handle  the  other  memory  control  and  data  buffering 
functions  mentioned  above,  however,  the  logic  needed  to  accommodate 
address  shuffling  can  easily  be  integrated  onto  the  same  device. 
While  such  a development  is  not  manditorv  (a  minimal,  double- 
buffered  memory  control  function  could  be  implemented  using  MSI 
and  SSI  logic  without  unacceptably  large  reliability  penalties) , 
it  is  particularly  appealing  when  it  is  recognized  that  this  same 
device  could  be  used  as  a memory  control  chip  in  the  Demodulator 
processor  as  well  (cf , Section  5) . For  purposes  of  this  discussion, 

54 


I 


i 


1 

I 


RAYTHEON  COMPANY 

EQUIPMENT  DIVISION 

therefore,  it  is  assumed  that  a special  memory  control  chip 
development  will  be  undertaken  and  hence  that  a double-buffered 
memory  is  unnecessary. 

4.2  INTERLEAVER  MEMORY  CONTROL  CHIP 

A functional  block  diagram  of  the  interleaver  memory  control 
chip  is  shown  in  Figure  4-1.  The  data  buffer  and  code  checker 
consists  of  a 25-bit  register  used  to  buffer,  code  and  check  data 
being  transferred  between  the  processor  and  memory.  A complete 
cycle  proceeds  as  follows:  Data  is  fetched  from  memory;  it 
arrives  as  a single  25-bit  word  consisting  of  24  information  bits 
and  one  parity  bit.  Parity  is  checked,  and  if  found  to  be  correct, 
the  word  is  transferred  to  the  processor  as  three  8-bit  bytes  in 
three  successive  clock  pulses.  (If  a parity  violation  is  discovered, 
the  rippler  control  logic  is  activated  causing  the  rippler  to  be 
"exercised"  and  hence  to  isolate  any  faulty  bit  line  and  ripple 
in  a spare.  This  latter  procedure  is  identical  to  that  used  in 
the  FT SC  and  is  described  in  detail  elsewhere.)  Three  data  bytes 
are  then  accepted  from  the  processor,  formatted  into  a 24-bit 
word  to  which  a parity  bit  is  added,  and  stored  to  the  memorv's 
next  available  store  address. 

The  control  logic  section  shown  in  Figure  4-1  is  a relatively 
straightforward  combinational  logic  circuit  used  to  translate  the 
various  input  control  signals  into  internal  control  signals.  It  also 
generates  the  output  strobes  used  to  select  the  appropriate  one 
of  the  three  memory  subblocks.  The  store-address  and  fetch-address 
generators  and  their  time-shared  stop-addrdss  comparator  are 
described  in  paragraph  4.2.2.  The  following  paragraphs  first 
describe  the  principle  under  which  they  operate. 
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4.2."  ADDRESS  SHUFFLING  USING  PRIMITIVE  ROOTS  OVER  GF(2n) 

Let  a be  a fixed  element  in  GF(2n)  and  consider  the  mapping 
B -*■  B<*  with  B any  element  in  the  same  field.  If  a and  B are 

represented  as  binary  n-tuples,  then  this  mapping  converts  the 
binary  sequence  1,  2,  3,  . 2n-l  into  the  binary  sequence  a,  2a, 

3a,  (2n-l)o  (with  1 = 000-‘01,  2 = 000***010,  etc.).  Since 

a ? 0 and  since  multiplication  here  is  over  GF(2n)  all  of  the 
n-tuples  in  this  second  sequence  are  unique.  Thus,  the  mapping 
B Bo  interleaves  the  address  sequence  1,  2,  3,  ...»  2n-l. 

(This  is  the  normal  interleaving  procedure  implemented  with  pseudo- 
noise-sequence address  generators . ) 

Now  consider  the  operation  of  a de-interleaver*.  Let  the  informa 

tion  W^,  Wj,.../  WN  be  interleaved  by  the  inverse  mapping  B * o ^ 

so  that  it  is  actually  received  in  the  order  Wo_i1,  Wa_i2,  ...» 

W _i„.  (For  the  moment,  N = 2n-l.)  If  this  information  is  stored 

in  memory  in  "natural"  order  (i.e.,  if  wa-l^  is  stored  in  location 

i) , it  can  be  successfully  de- inter leaved  by  fetching  it  in  the 

order  ol,  a2,  ...,  aN . This  follows  because  W _i . is  stored  in 

a xi 

location  i;  thus  W.  must  be  stored  in  location  ai . If  a new 

i 

block  of  information  W'_i,,  W'_i_,  ...,  W'-i.T  is  being  received 

a ^1  to  :2'  a N ^ 

and  must  be  stored  at  the  same  time  the  previous  block  is  being 

fetched  and  if  only  N memory  locations  are  to  be  used,  the  new 

information  W'_i_  must  be  stored  in  location  al  since  that  is  the 
a ■L1 

only  location  initially  available  (after  has  been  fetched) . 

By  extension,  W'-i.  must  be  stored  in  location  ai.  Thus,  this 
1 a i 

second  block  of  information  is  de-interleaved  by  accessing  memory 
2 2 o 

in  the  order  a 1,  a 2,  ...,  a^N.  That  is,  since  Wg-1^  is  now 


‘Although  the  discussion  here  concentrates  on  the  de-interleaver, 
it  is  apparent  that  the  same  principle  applies  to  the  interleaver 
as  well  with  a replaced  everywhere  lay  a~i. 


57 


F"  - — ^ ™ 

RAYTHEON  COMPANY 

EQUIPMENT  DIVISION 

stored  in  location  ai,  must  be  in  location  a‘i.  It  thus 
follows  that  if  new  information  is  to  be  stored  in  those  locations 

just  vacated  as  each  word  of  the  previous  block  is  fetched,  then 

th  £ A £ 

the  l block  should  be  accessed  in  the  order  a 1,  a 2,  . a N. 

If  N is  less  than  2n-l,  the  above  procedure  need  only  be 

altered  as  follows:  During  the  frame,  abort  all  stores  to 

_£ 

addresses  a for  which  a a>N  and  abort  all  fetches  from  addresses 

b for  which  a ^ 1^b>N,  Recall  that  in  the  frame,  the  addresses 

are  read  in  the  order  a**l,  a* *2,  ...  . Since  a a >N  if  and  onlv 
£ 

if  a = a *'i  with  i>N,  this  rule  prevents  stores  only  to  the  last 
2n-l-N  addresses  that  would  normally  have  been  accessed.  Consequently," 
whenever  an  access  is  aborted,  the  nexi  address  in  the  seauence 
can  immediately  be  accessed,  regardless  of  whether  the  aborted 
address  is  a fetch  or  a store  address;  there  is  no  danger  that  an 
attempt  will  be  made  to  store  into  a location  before  its  previous 
contents  have  been  fetched . 

The  address  generator  described  in  the  following  paragraphs 

is  based  in  the  principles  just  outlined.  In  this  implementation, 

a is  chosen  to  be  a primitive  element  in  GF(2n)  (although  the 

concept  is  equally  valid  for  non-primitive  elements)  and  shift- 

registers  are  used  to  generate  successive  powers  of  a.  Use  is 

£ 

made  of  the  fact  that,  for  any  £,  a can  be  expressed  in  the  form 

£ 2 n-1 

a = a.  + a,  a + a_a  + . . . + a .a  with  a . in  GF ( 2)  . 

01  2 n-i  i 

4.2.2  ADDRESS  GENERATOR  IMPLEMENTATION 

An  address  generator  block  diagram  is  shown  in  Figure  4-2. 

For  exposition  purposes,  the  address  generator  depicted  is  for 
n=3  (N  < 7) ; the  generalization  to  the  case  of  interest  here  (n=10) 
is  obvious . 
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The  vertical  shift-register  shown  at  the  left  in  Figure  4-2 
is  a frame  counter.  It  is  initialized  to  the  state  100  and  is 
shifted  once  each  frame  with  the  control  level  (from  the  timing 
and  control  logic)  in  the  low  state.  Thus,  the  shift-register 
state  at  the  beginning  of  each  frame  assumes  one  of  the  following 
states,  in  sequence:  a®  = 100,  a^-  = 010,  a2  = 001,  a2  = 110, 
a4  = 011,  a5  = 111,  a6  = 101. 

The  first  address  in  each  frame  is  generated  in  the  following 
manner:  The  ripple  counter  is  initialized  to  100.  Its  contents 

are  clocked  into  the  shift-register  immediately  below  it  and,  if 
the  top-most  bit  in  the  vertical  shift-register  is  a one,  into  the 
buffer  register  below  that  (which  is  first  cleared) . The  horizontal 
and  vertical  shift-registers  are  then  each  shifted  once  (the  latter 
with  the  control  level  in  the  high  state) , and  the  buffer  register 
again  clocked.  This  last  operation  is  repeated  once  more  (in 
general,  for  a total  of  n times) , at  which  point  the  output  address 
has  been  generated.  The  ripple  counter  is  augmented  by  1 and  the 
procedure  is  repeated  for  each  successive  address. 

If  the  contents  of  the  ripple  counter  exceed  N and  the  address 
generator  is  generating  store  addresses,  the  frame  has  been 
completed.  The  frame  counter  is  incremented  and  the  procedure 
begins  anew.  If  the  generator  is  producing  fetch  addresses,  how- 
ever, the  contents  of  the  ripple  counter  are  clocked  into  the 
abort-address  shift-register  (in  the  dashed  box  in  Figure  4-2)  and 
shifted  once  before  they  are  compared  to  N.  If  N is  exceeded,  the 
address  in  question  is  aborted  and  the  ripple  counter  immediately 
augmented  to  the  next  address. 

Table  4-la  lists  the  sequence  of  addresses  generated  by  the 
address  generator  of  Figure  4-2  when  N = 2n-l  = 7..  In  this  case, 
the  fetch  and  store  addresses  are  identical.  Figure  4-lb  lists  the 
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TABLE  4-1 

ADDRESS  GENERATOR  ADDRESS  SEQUENCES 


a)  N = 2n-l  = 7 


ADDRESS 

NUMBER 


FRAME  NUMBER 


ADDRESS 

NUMBER* 

FRAME  NUMBER 
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6 
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3 

6 
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3 

3 

6 

6 

_7 

*F  = Fetch  Address 

S = Store  Address 
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address  sequence  when  N=5 . In  this  case,  the  fetch  and  store 
addresses  may  differ;  note,  however,  that  no  address  is  stored  to 
before  its  previous  contents  are  retrieved. 

4.3  OPERATING  SPEED 

The  serial  nature  of  the  fetch  and  store  generators  obviously 
limits  the  rate  at  which  memory  accesses  can  be  made.  Nevertheless 
new  addresses  can  be  generated  at  the  rate  of  one  address  each 
1.5  usees,  far  in  excess  of  the  rate  at  which  they  are  required  or 
at  which  the  processor  could  process  the  data  thus  accessed. 

Although  detailed  microroutines  have  not  been  developed  for 
the  interleaving  or  de-interleaving  algorithms,  it  can  be  readily 
established  that  the  number  of  operations  required  per  information 
bit  is  considerably  less  than  the  number  needed  in  either  the 
Decoder  processor  or  in  the  Demodulator  processor  (cf , Section  5) . 
The  interleaving  and  de-interleaving  functions  are,  of  course, 
accomplished  entirely  by  the  address  generation  logic  described 
in  paragraph  4.2.  The  only  function  remaining  for  the  de-inter- 
leaver to  perform,  therefore,  is  the  chip-combining  function  on 
the  de-interleaved  data.  The  interleaver  has  only  to  perform  the 
rate  1/2  convolutional  encoding  function  on  its  input  data. 

Neither  function  should  seriously  tax  the  core  processor's 
throughput . 
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5.0  DEMODULATOR  PROCESSOR 

It  is  anticipated  that  the  Demodulator  processor  will  have  to 
demodulate  both  single-channel  QPSK  and  multi-channel  8-ary  FSK. 

Since  the  latter  demodulation  requirement  is  considerably  more 
complex  than  the  former,  it  will  serve  as  the  basis  for  sizing  the 
processor.  The  processing  requirements  for  both  synchronous  and 
asynchronous  8-ary  FSK  demodulation  are  determined  in  paragraph 
5.1.  The  candidate  Demodulator  processor  is  then  discussed  in 
paragraph  5.2. 

5 . 1 8-ARY  FSK  DEMODULATION  PROCESSING  REQUIREMENTS 

Figure  5-1  is  a block  diagram  of  the  overall  signal  processing 
operations  required  for  demodulation  and  d ing  of  8-ary  FSK 
channels  that  are  organized  in  FDMA.  The  ov  .1  processing 
includes  dehopping  of  the  frequency  hopped  w -rms,  A/D  conversion, 
channel  demultiplexing,  8-ary  tone  extraction,  soft-decision 
detection,  diversity  chip  combining,  de-interleaving  and  decoding. 
This  section  is  concerned  with  the  processing  operations  between 
the  A/D  converter  and  the  de-interleaver.  The  MFSK  demodulator 
is  assumed  to  be  provided  with  inphase  and  quadrature  samples  from 
the  dehopping  down-converter.  The  demodulator  is  required  to 
provide,  for  each  of  several  access  channels  imbedded  in  the  input 
data,  a series  of  soft-decision  chip-combined  metrics  corresponding 
to  each  of  the  eight  alternative  received  signal  hypothesis.  If 
the  channels  are  all  mutually  synchronized  to  satellite  time, 
as  in  Figure  5-la,  there  is  no  need  for  AFC  and  automatic  sync 
recovery,  and  the  signal  extraction  process  is  relatively  simple. 

If  the  channels  are  unsynchronized  in  time  and  frequency,  as  in 
Figure  5-lb,  the  signal  extraction  is  burdened  by  the  need  to  align 
each  channel  separately  in  both  time  and  frequency.  Table  5-1 
shows  typical  characteristics  for  8-ary  FSK  channels  appropriate 
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for  both  synchronized  and  unsynchronized  channels.* 

The  precise  time  alignment  in  the  synchronous  case  permits  the 
use  of  frequency  hop  intervals  equal  to  the  waveform  duration  T. 

In  this  case,  sequential  chips  are  presumed  to  be  independent  with 
respect  to  CW  jamming  interference,  and  a sequence  of  several 
successive  chips  can  be  combined  after  soft  decoding  to  achieve 
jamming  diversity.  For  the  unsynchronized  case,  the  time  misalign- 
ment of  channels  is  such  tha,t  a much  slower  hop  rate  is  required  to 
assure  recovery  of  the  waveforms  of  all  the  channels.  A guard 
interval  is  required  at  both  edges  of  a hop  interval  to  accommodate 
the  worst  case  channel  time  misalignment  with  the  dehopping 
converter . 

The  availability  of  precise  synchronization  (or  lack  thereof) 
also  imposes  special  considerations  in  channel  demultiplexing  and 
signal  extraction  processes.  In  the  case  of  synchronized  signals, 
the  waveforms  in  all  access  channels  arrive  at  the  satellite  aligned 
in  both  frequency  and  time.  If  the  frequency  hopping  is  coordinated 
among  all  access  channels  so  that  a fixed  relative  channel  occupancy 
with  orthogonally  spaced  tones  pertains  at  the  baseband  output 
of  the  dehopping  heterodyner,  then  8-ary  tone  filtering  for  a 
multitude  of  access  channels  can  be  performed  simultaneously  with 
a single  wideband  FFT  calculation.  The  processing  recruirements  for 
signal  separation  by  means  of  an  FFT  are  presented  in  paragraph  5.1-1 

In  the  case  of  unsynchronized  channel  groups,  the  misalignment 
t of  the  signal  with  respect  to  satellite  time  can  be  several 
intervals  of  the  signal  duration  T and  the  frequency  offset  can  be 


*Some  scaling  may  be  needed  to  relate  the  results  based  on  these 
parameters  to  those  processing  requirements  actually  anticipated 
(cf,  paragraph  5.2). 
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as  much  as  ^ . In  the  unsynchronized  arrangement,  timing  and  AFC 
control  loops  are  required  within  the  processor  to  acquire  proper 
registration  before  tones  can  be  correctly  demodulated.  This  is 
most  efficiently  performed  bv  a cascade  arrangement  of  a down- 
sampling  channel  demultiplexer  followed  by  separate  processing  in 
each  channel  to  recover  the  associated  8-ary  FSK  tone  structure. 
With  this  tandem  processing  arrangement,  the  timing  and  frequency 
corrections  can  be  applied  independently  in  each  channel . The 
processing  requirements  for  demultiplexing  unsynchronized  channels 
are  presented  in  paragraph  5.1.2.  The  additional  processing 
requirements  associated  with  time  and  frequency  control  loops  for 
the  individual  channels  are  shown  in  paragraph  5.1.3.  Paragraph 
5.1.4  describes  an  algorithm  for  soft  decision  decoding  that  is 
applicable  for  either  synchronized  or  unsynchronized  channels. 

5.1.1  SIGNAL  EXTRACTION  FOR  SYNCHRONIZED  CHANNELS 

In  the  synchronized  channel  arrangement,  the  8-ary  orthogonal 
tone  set  for  a single  channel  occupies  8/T  Hz  where  T is  the  wave- 
form duration.  If  the  tone  assignments  for  M channels  are  arranged 
without  interchannel  guard  band,  the  total  bandwidth  occupied  by 
the  channel  group  is  8M/T.  Interchannel  guard  bands  are  not 
needed  in  a fully  synchronized  system  because  all  the  signal 
components  are  orthogonal.  The  sampling  rate  required  to  represent 
the  multichannel  composite  signal  in  complex  (I  and  Q)  components  is 

f > 8M/T 
s 

At  the  Nyquist  sampling  rate,  each  signal  interval  T would  contain 

N = f T = 8M 
s 

samples.  These  N samples  defined  over  the  interval  T can  be 
processed  by  means  of  a discrete  Fourier  transform  (DFT)  to  extract 
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N orthogonal  frequency  components  with  the  frequency  spacing 
between  lines  equal  to  1/T . If  the  sampling  rate  is  chosen  so 
that  N is  a power  of  two,  then  the  DFT  can  be  conveniently  cal- 
culated with  the  FFT  algorithm.  This  sampling  rate  corstraint 
requires  that  M be  a power  of  two.  In  actual  practice,  some  guard 
band  is  required  at  the  band  edge  to  accommodate  the  rolloff  of 
the  alias  filter  ahead  of  A/D  converter,  so  that  the  edge  of  the 
band  is  not  useful  for  carrying  traffic.  Allowing  12/5%  guard  band 
for  the  alias  filter  results  in  a useful  channel  modularity  of 
M = 7,  14,  28,  etc.  when  FFT  processing  techniques  are  employed. 


Table  5-2  shows  the  processing  requirements  for  the  FFT 
filter  as  a function  of  the  number  of  multiplexed  channels.  The 
operations  per  sample  can  be  scaled  by  the  channel  spacing 
Af  = 1.6  KHz  to  obtain  the  processing  speed  per  output  channel. 

To  account  for  the  assumption  that  the  alias  guard  band  at  band 
edge  makes  only  7/8  of  the  N^  slots  useful  for  traffic,  the  wasted 
slots  were  prorated  to  the  other  channels  to  obtain  the  per  channel 
processing  speeds.  In  the  final  columns,  the  per  channel  processing 
speeds  are  normalized  to  throughput  in  b/s  by  dividing  the  previous 
results  by  the  75  b/s  channel  traffic  rate. 


5.1.2  SIGNAL  EXTRACTION  FOR  UNSYNCHRONIZED  CHANNELS 


Extraction  of  signal  components  for  multichannel  8-ary  FSK 
modulation  when  the  channels  are  unsynchronized  in  time  and 
frequency  can  be  conducted  in  two  tandem  operations.  In  the  first 
stage,  the  channels  are  separated  by  a demultiplexing  filter 
implemented  as  an  FFT.  In  the  second  stage,  the  demultiplexed 
subbands  are  separately  processed  by  an  FFT  to  recover  the  signal 
components  of  the  individual  channels.  The  second  staqe  FFT  is 
synchronized  in  time  and  frequency  to  the  respective  channel 
modulation . 
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TABLE  5-2 
SIGNAL  EXTRACTION 

FOR  SYNCHRONOUS  8 -ARY  FSK  CHANNELS 


NO. 

CHANNELS 

M 


FFT 

SIZE 

PROCESSING 
PER  INPUT 
SAMPLE 

PROCESSING* 

PER  INPUT 
CHANNEL 
(KOPS/SEC) 

PROCESSING* 
PER  BIT 
(KOPS) 

N 

ADD 

MULT 

ADD 

MULT 

ADD 

MULT 

64 

18 

12 

32.4 

21.6 

.43 

.29 

128 

21 

14 

37.8 

25.2 

.50 

.34 

256 

24 

16 

43.2 

28.8 

.58 

.38 

512 

27 

18 

48.6 

32.4 

.65 

.43 

♦Indicated  processing  applied  to  chip  combining/coding  arrangement 
with  8-fold  redundancy.  For  non-redundant  transmission  schemes, 
operations  should  be  scaled  by  1/8.  Per  channel  processing  is 
shown  for  75  b/s  channels. 
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A convenient  channelization  arrangement  that  is  compatible 

with  the  FPT  algorithm  constraint  of  fflT  = N,  where  N is  a power 

of  two  is  obtained  when  the  sampling  frequency  f , channel  spacing 

s 

Af  and  signal  duration  T are  related  in  the  following  manner: 
fs  = Nl4f, 

if  = Nji  . 

where  and  Nj  are  powers  of  two. 

With  8-ary  FSK  modulation  in  a channel,  the  channel  occupancy 

is  8/T  for  the  most  compact  arrangement  that  provides  orthogonal 

signals.  When  all  channels  are  synchronized  in  time  and  frequency, 

the  signals  in  adjacent  channels  are  also  orthogonal  so  that  no 

guard  band  is  required  between  channels  and  a channel  spacing 
g 

Af  = — can  be  used.  However,  lack  of  synchronization  among 
channels  destroys  orthogonality  so  that  the  channels  must  be  sep- 
arated with  adequate  guard  band  to  avoid  interchannel  spillover. 

Increasing  the  channel  spacing  to  provides  a frequency  separation 

7 

between  extremal  tone  frequencies  in  adjacent  channels  of  — . This 
corresponds  to  interchannel  spillover  of  about  -2  7 dB  for  rectangular 
pulse  envelopes  of  duration  T which  produce  a corresponding  power 
spectrum  of  the  form 


H ( f ) = 


sin  irfT 
irfT 


If  the  channel  spacing  is  Af  = y and  the  sampling  frequency 
f = N.  Af,  with  N1  a power  of  two,  an  FFT  demultiplex  filter  can 
be  implemented  that  will  provide  N^  resolution  lines.  Not  all  of 
these  lines  can  be  used  for  access  channels  because  of  the  need  for 
guard  bands  at  the  band  edges  to  accommodate  the  rolloff  of  the 
alias  filter  ahead  of  the  A/D  converter.  Allowing  a 12.5%  guard 
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band  for  this  filter  reduces  the  number  of  useful  channels  to  gN, 
so  that  channel  modularity  of  M = 7,  14,  28,  etc.  channels  can 
be  conveniently  obtained  by  FFT  processing. 

5. 1.2.1  Demultiplex  Filter  Processing 

Figure  5-2  shows  the  filter  characteristics  appropriate  for 
demultiplexing  a multichannel  group  of  unsynchronized  8-ary  FSK 
signals.  In  the  figure  the  channel  spacing  is  Af  = the 

sampling  rate  of  the  complex  input  signal  is  f = N^Af,  and  the 
main  lobe  spectral  width  (between  nulls)  of  each  demultiplex  channel 
is  Af.  In  order  to  avoid  channel  spillover  among  the  demultiplex 
channels,  the  first  and  subsequent  sidelobes  of  the  subband  filters 
are  required  to  be  better  than  30  dB  down. 

A demultiplex  filter  that  meets  these  requirements  can  be 
achieved  using  a 4N^-point  FFT  with  raised  cosine  amplitude  weight- 
ing to  suppress  resolution  filter  sidelobes.  (In  fact,  with  raised 
cosine  weighting  the  largest  sidelobes  will  be  nearly  40  dB  down.) 

A 4N^-point  transform  is  inefficient,  however,  because  this  process 
computes  the  spectral  components  for  41^  subbands  and  only  every 
fourth  such  component  corresponds  to  an  access  channel  center 
frequency.  Also,  the  output  sampling  rate  for  each  of  the  4N^  sub- 
bands is  only  equal  to  ^-  = whereas  the  mainlobe  width  of  the 

filter  is  Af,  so  that  a four-fold  overlap  of  the  FFT  process  would 
be  required  to  achieve  the  Nyquist  sampling  rate  for  the  subsequent 
tone  extraction  process  in  the  channel  processors. 

The  required  demultiplex  processing  can  be  performed  much  more 
efficiently  by  decimating  the  4N^-point  FFT  to  eliminate  those 
butterfly  operations  that  do  not  affect  the  spectral  components 
of  interest.  A processing  arrangement  to  achieve  that  is  shown  in 
Figure  5-3.  In  this  scheme,  an  N^-point  FFT  algorithm  is  cascaded 
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with  a preprocessor.  The  combined  process  computes  every  fourth 

line  of  a 4N^-point  FFT  with  fourfold  overlap  and  raised  cosine 

amplitude  weighting.  The  N^-point  FFT  computes  the  upper  quarter 

of  the  rightmost  rank  of  the  convention.il  FFT  .irr.iv  (i.e.,  every 

fourth  frequency  component)  and  I lie  preprocessor  compute::  .ill  the 

butterfly  operations  to  the  left  that  thread  into  the  upper  right 

quarter  of  the  array.  In  the  first  stage  of  the  preprocessor, 

raised-cosine  amplitude  weighting  is  used  to  suppress  the  resolution 

sidelobes  to  a level  near  -40  dB . Four  separate  preprocessor  paths 

are  used  in  the  amplitude  weighting  to  fold  the  input  samples  into 

four  blocks  of  data  containing  2N^  time-domain  samples  that 

represent  the  even  spectral  lines  of  4N  -point  FFT's  with  raised- 

1 

cosine  weighting  and  75%  time  overlap.  The  overlap  switches 
alternately  route  a sequence  of  2N^  samples  from  a pair  of  first- 
stage  paths  to  the  second  stage  of  the  preprocessor.  The  second- 
stage  process  is  carried  out  in  two  parallel  paths  in  the  same 
manner  as  the  first  stage,  except  amplitude  weighting  is  omitted. 

The  outputs  of  the  second-stage  processor  paths  are  folded  time 
domain  samples  in  blocks  of  samples  that  represent  every  fourth 
spectral  line  of  4N^-point  FFT's,  again  with  raised-cosine  weighting 
and  75%  time  overlap.  The  multiplexing  switch  alternately  routes 
a block  of  samples  to  the  -point  FFT  from  the  two  second-stage 
preprocessors.  As  a result,  the  time  multi ploxed  output  from  the 
N^-point  FFT  provides  output  samples  that  represent  the  spectral 
content  of  each  of  the  multiplex  filters  shown  in  Figure  5-2.  With 
the  fourfold  time  overlapped  transforms,  the  output  sample  rate  per 
channel  is  equal  to  the  subchannel  Nyquist  rate  Af.  As  shown  by 
the  sidelobe  levels  in  Figure  5-2,  the  above  channel  sampling  rate 
is  adequate  to  reduce  alias  foldover  to  a level  well  below  -30  dB . 

A stated  earlier,  the  raised  cosine  amplitude  weighting  will 
vid*>  sidelobe  levels  near  -40  dB. 
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The  processing  operations  in  the  preprocessor  consist  of  five 
complex  additions  and  two  half-complex  multiplications  (or  equiva- 
lently 10  real  additions  and  four  real  multiplications)  per  complex 
input  sample,  independent  of  . The  processing  operations  in  the 
cascaded  N^-point  FFT  are  three  log2N^  real  additions  and  two 
^°g2Nl  rea'*'  Multiplications  per  complex  input  sample. 

Table  5-3  shows  the  processing  requirements  for  the  demultipled 
FFT  filter  as  a function  of  the  number  of  multiplexed  channels. 

The  operations  per  sample  can  be  scaled  by  the  channel  spacing 
Af  = 1.6  KHz  to  obtain  the  processing  speed  per  demultiplex  output 
channel . To  account  for  the  assumption  that  the  alias  guard  band 
at  band  edge  makes  only  7/8  of  the  slots  useful  for  traffic,  the 
wasted  slots  were  prorated  to  the  other  channels  to  obtain  the  per 
channel  processing  speeds.  In  the  final  columns,  the  per  channel 
processing  speeds  are  normalized  to  throughput  in  b/s  by  dividing 
the  previous  results  by  the  75  b/s  channel  traffic  rate. 

5. 1.2. 2 Channel  Signalling-Tone  Processor 

The  channel  signalling-tone  processor  is  provided  with  a sample 
sequence  representing  the  signal  and  noise  components  out  of  the 
corresponding  demultiplex  filter.  The  sampling  rate  is  equal  to 
the  channel  spacing  Af,  and  corresponds  to  the  spectral  width  of 
the  main  lobe  of  the  multiplex  filter  as  shown  in  Figure  5-2.  Prior 
to  performing  the  FFT  to  extract  the  frequency  components,  the 
input  sample  sequence  is  heterodyned  by  the  AFC  frequency  correction 
to  center  the  tone  frequencies  in  the  resolution  cells  of  the  FFT. 
The  AFC  process  is  described  in  paragraph  5. 1.2. 3.  The  FFT  opera- 
tion is  synchronized  to  the  waveform  through  the  operation  of  an 
early-late  synchronizing  loop  that  aligns  the  FFT  channel  window  to 
the  waveform  envelope.  The  early-late  synchronizing  process  is 
described  in  Section  5. 1.2. 4. 
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TABLE  5-3 

DEMULTIPLEX  FILTER 

FOR  UNSYNCHRONIZED  8 -ARY  FSK  CHANNELS 


NO. 

CHANNELS 

M 

FFT 

PROCESSING 
PER  INPUT 
SAMPLE 

PROCESSING* 

PER  INPUT 
CHANNEL 
(KOPS/SEC) 

PROCESSING* 
PER  BIT 
(KOPS) 

N1 

ADD 

MULT 

ADD 

MULT 

ADD 

MULT 

7 

8 

19 

10 

34.2 

18. 

.45 

.24 

14 

16 

22 

12 

39.6 

21.6 

.52 

.29 

28 

32 

25 

14 

45. 

25.2 

.60 

.34 

56 

64 

28 

16 

50.4 

28.8 

.67 

.38 

*lndicated  processing  applies  for  a coding  scheme  with  3-fold 
redundancy  and  25%  guard  interval  per  frequency  hop.  Results  should 
be  scaled  by  1/4  to  obtain  values  for  a non-redundanc  transmission 
scheme  without  guard  time.  Per  channel  processing  is  shown  for 
75  b/s  channels. 
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The  AFC  corrected  samples  occur  at  a sampling  rate  of 
Af  = 1.6  KHz,  and  the  signalling  tones  are  spaced  by  ^ = = TOO  Hz. 

Therefore,  a N = 16-point  FFT  is  required  to  recover* the  eight 

, . N 

tone  frequencies  of  interest.  A 16-point  FFT  requires  — Log^N  = 32 

butterflies . 

The  eight  signal  components  are  obtained  from  FFT  resolution 
cells  4 through  11.  From  Figure  5-2,  it  is  observed  that  the  spectral 
components  corresponding  to  the  different  tone  frequencies  are 
weighted  differently  owing  to  the  bandpass  shape  of  the  demultiplex 
filter.  Therefore,  the  last  stage  of  the  FFT  must  provide  amplitude 
weighting  to  equalize  the  amplitude  of  the  tone  components  (i.e., 
noise  whitening) . The  required  amplitude  scaling  can  be  programmed 
into  the  multiplication  coefficients  in  the  final  FFT  butterflies 
so  that  no  additional  arithmetic  operations  are  needed.  The 
required  processing  operations  for  the  16-point  FFT  are  shown  in 
Table  5-4. 


5. 1.2. 3 Synchronization  of  Demultiplexer  Output  Channels 

The  demultiplexer  provides  output  samples  at  the  channel  Nyquist 
rate  of  Af  = ^ = 1*6  KHz.  These  samples  must  be  heterodyned  by  a 
complex  sinusoid  to  compensate  for  the  frequency  offset  of  the 
channel,  and  thereby  center  the  8-ary  FSK  tone  frequencies  in  the 
resolution  cells  of  the  FFT  tone  filter.  Likewise,  the  FFT  tone 
filter  must  be  synchronized  to  the  pulse  envelopes  of  the  channel 
signalling.  Both  of  these  synchronization  processes  require 
tracking  loops  to  sense  the  offset  error  and  to  adjust  the  processor 
to  null  the  error.  In  conventional  channel  usage  schemes,  a fixed- 
format  sync  preamble  is  transmitted  ahead  of  the  test  to  provide  an 
interval  for  the  time  and  frequency  tracking  loops  to  acquire  the 
signal  and  resolve  ambiguities  in  time  and  frequency.  If  the  pre- 
amble is  chosen  so  that  a known  pair  of  tone  frequencies  are 


TABLE  5-4 

MFSK  SIGNAL  EXTRACTION 
FOR  DEMULTIPLEXED  ASYNCHRONOUS  CHANNELS 


FFT 

SIZE 

N 

PROCESSING 

PER  INPUT 
SAMPLE 

PROCESSING* 

PER  INPUT 
CHANNEL 
(KOPS/SEC) 

PROCESSING* 

PER  BIT 
(KOPS) 

ADD 

MULT 

ADD 

MULT 

ADD 

MULT 

16 

12 

8 

19.2 

12.8 

.256 

.171 

♦Indicated  processing  applies  for  a coding  scheme  with  3-fold  redundancy 
redundancy  and  25%  guard  interval  for  frequency  hop.  Results 
should  be  scaled  by  1/4  to  obtain  values  for  a non-redundancy 
transmission  scheme  without  guard  time.  Per  channel  processing 
is  shown  for  75  b/s  channels. 
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transmitted  alternately  for  sync,  then  a special  matched-filter 
processor  can  be  devised  to  achieve  sync.  Once  sync  is  achieved, 
the  timing  and  frequency  connections  can  be  held  for  the  duration 
of  the  message. 

Another  approach  to  synchronization  can  be  devised  that  does 
not  require  the  transmission  of  a preamble.  This  approach  is  of 
great  interest  for  use  in  LPI  report-back  channels  where  minimum 
transmission  time  is  advantageous.  In  this  scheme,  it  is  necessary 
to  buffer  the  demultiplexer  output  to  save  the  message  until  the 
appropriate  timing  and  frequency  offsets  can  be  derived,  at  which 
time  the  buffered  data  can  be  synchronously  demodulated.  The 
required  delay  in  the  buffer  will  generally  be  somewhat  greater 
than  the  duration  of  the  preamble  that  is  replaced  owing  to  the 
fact  that  the  message  structure  will  generally  not  be  of  optimum 
form  for  rapid  sync  recovery. 

Figure  5-4  shows  a block  diagram  for  a synchronization  system 
that  can  be  used  for  channels  that  do  not  contain  message  preambles. 
As  shown,  the  first  block  of  data  appearing  in  the  demultiplexer 
output  is  processed  twice  in  the  FFT . The  first  FFT  operation  is 
used  to  recover  time  and  frequency  synchronization.  At  the  end  of 
the  first  block  of  L symbols  (i.e.,  16L  samples),  the  input  switch 
of  the  FFT  is  activated  so  that  the  delayed  (by  L symbols)  demulti- 
plexer output  samples  are  routed  to  the  FFT  processor  for  a second 
pass.  These  samples  are  synchronously  demodulated  with  the 
appropriate  time  and  frequency  adjustments  as  derived  from  the  first 
pass.  The  buffered  data  continues  to  be  processed  until  the 
demultiplexer  output  is  again  connected  to  the  FFT  processor  to 
initiate  the  synchronization  cycle  for  the  next  messaqe  in  the 
channel . 
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5. 1.2. 3.1  AFC  Control  Loop 

The  AFC  error  sensor  is  implemented  with  a 16-point  cross 
correlation  process  as  shown  in  Figure  5-4.  The  heterodyned 
samples  that  are  applied  to  the  FFT  are  delayed  by  two  symbol 
intervals  (32  input  samples)  to  permit  the  soft-decision  decoder 
to  determine  the  most  probable  transmitted  symbol  in  the  8-ary 
symbol  alphabet.  This  decision  is  used  to  select  one  of  eight 
cross-correlation  reference  signals  to  apply  to  the  16  input  samples 
that  were  used  by  the  FFT  to  derive  the  decision  variable. 

The  eight  cross-correlation  reference  signals,  4 < m $ 11, 
are  the  complex  sample  sequences 


for  0 < n ^ 15,  4 ^ m ^ 11. 

The  cross  correlation  of  the  appropriate  reference  signal  with 
the  sample  sequence  into  the  FFT  produces  a "discriminator"  response 
with  a null  at  the  center  of  the  frequency  cell.  The  cross- 
correlation output  is  synchronously  demodulated  by  the  signal 
component  as  derived  by  the  FFT  to  produce  a frequency  error  indica- 
tion that  is  linear  with  offset  error.  The  processing  requirements 
for  the  AFC  cross  correlator  correspond  to  one  complex  multiply 
and  one  complex  add  per  input  sample.  After  the  complex  cross- 
correlation metric  is  computed,  the  result  is  coherently  demodulated 
with  a half-complex  multiply  performed  once  per  symbol  interval  T 
using  the  signal  sample  previously  extracted  with  the  FFT  as  the 
demodulation  reference.  The  real  part  of  the  demodulated  error 
signal  is  scaled  by  the  servo  loop  qain  and  smoothed  in  a recur- 
sive integrator  (lag  filter)  to  derive  the  offset  frequency  for 
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the  heterodyned  oscillator.  This  offset  frequency  is  interpreted 
as  the  phase  slope  per  sample  interval  in  the  AFC  heterodyner 
oscillator.  The  heterodyning  oscillator  computes  a new  phase 
angle  once  per  sample  interval  in  a recursive  integrator  using  the 
relationship  + A0,  where  A 0 is  the  phase  slope  per 

input  sample  interval  as  obtained  from  the  loop  filter.  The 
computed  phase  can  be  truncated  to  about  5 significant  bits  to 
obtain  an  address  for  accessing  a small  sine/cosine  table  to  obtain 
each  complex  sample  of  the  heterodyning  sequence.  The  heterodyning 
sequence  does  not  require  great  precision;  however,  the  phase 
integrator  requires  sufficient  precision  to  provide  an  AFC  frequency 
step  increment  that  is  narrow  compared  to  the  resolution  bandwidth 
of  the  FFT  that  follows  the  heterodyning  mixer.  Referenced  to  the 
sample  rate  f = 1.6  KHz,  the  use  of  eight-bit  precision  in  the 
phase  integrator  would  provide  an  AFC  step  size  of  6.6Hz  or  1/16  of 
the  FFT  resolution  width. 

5. 1.2. 3. 2 Early/Late  Control  Loop 

The  early/late  control  loop  operates  in  essentially  the  same 
manner  as  described  for  the  AFC  loop  with  the  following  exceptions: 

(1)  The  correlation  reference  signals  are  defined  over  the 
set  of  alternative  signal  tones  4 s m s 11  by 

' exp  Hi8)  Vl!1!  15 

. f-1,  0 s n s 7 

n +1,  8 s nsl5 

(2)  The  heterodyning  oscillator  is  replaced  by  an  error  thres- 
hold and  a timebase  increment/decrement  process. 

Because  there  are  16  input  samples  in  the  signal  interval  T, 
the  early/late  sync  loop  can  align  system  timing  to  within  6.6%  of 
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the  ideal  sync  timing.  A timing  offset  of  6 0%  introduces  a 
maximum  loss  in  carrier  to  noise  ratio  of  about  0.5  dB,  and  a neglig- 
ible intersymbol  spillover.  Better  accuracy  could  be  achieved  by 
using  an  interpolation  filter,  but  the  performance  gain  would  be 
slight  and  the  processing  requirements  would  be  nearly  doubled. 

As  in  the  case  of  the  AFC  loop,  the  appropriate  correlation 
reference  is  selected  by  the  soft  decision  processor  and  the  output 
from  the  cross  correlation  process  is  synchronously  demodulated 
using  the  previously  derived  signal  component  from  the  FFT  as  the 
demodulation  reference.  After  the  demodulated  early/late  error  is 
smoothed  in  the  integrator,  the  resultant  is  applied  to  a threshold 
detector.  When  the  error  magnitude  exceeds  the  threshold,  the  time 
base  counter  (divide  by  16)  is  incremented  or  decremented  as 
appropriate  to  reduce  the  timing  offset  error.  At  the  same  time, 
the  integrator  is  cleared  and  the  loop  operates  on  the  new  timing 
offset  until  the  integrator  output  next  exceeds  the  threshold. 

Under  steady-state  conditions,  the  loop  timing  oscillates  between 
the  discrete  timing  offsets  that  bracket  the  ideal  timing.  During 
acquisition,  the  pull-in  behavior  will  closely  approximate  that  of 
a linear  first  order  servo,  except  that  there  will  tend  to  be  an 
overshoot  of  one  sample  interval. 

5. 1.2. 3.  3 Processing  Summary  for  AFC/SYNC 

Table  5-5  shows  the  processing  requirements  for  AFC  and  time 
synchronization  broken  out  by  the  individual  processing  blocks  of 
Figure  5-4.  Each  of  the  two  cross-correlation  operations  corresponds 
to  one  full-complex  multiply  and  one  complex  add  per  input  sample 
(i.e.,  4 real  multiplications  and  4 real  additions).  The  synchron- 
ous demodulation  of  the  error  signals  requires  half  (the  real  part) 
of  a full  complex  multiply  performed  once  per  symbol  (i.e.,  every 
16  input  samples) . The  integrator  and  loop  gain  scaling  together 
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TABLE  5-5 

AFC/SYNC  PROCESSING 


OPERATION 

PROCESSING 

PER  INPUT 

SAMPLE 

PROCESSING* 

PER  INPUT 
CHANNEL 

S/SEC) 

PROCESSING* 

PER  B/S 
(OPS) 

ADD 

MULT 

ADD 

MULT 

ADD 

MULT 

16-Point 

Correla- 

tions 

(AFC  & Sync) 

4 

8 

B 

12.8 

85.3 

160.6 

Synchronous 
Demodulate 
(AFC  & Sync) 

0.25 

0.125 

B 

0.2 

5.3 

2.6 

Loop 

Integrator 
(AFC  & Sync) 

0.125 

0.125 

0.2 

B 

2.6 

2.6 

Heterodyne 

Oscillator 

(AFC) 

1 

1.6 

B 

21 . 3 

” 

Early/Late 

Threshold 

Compare 

(Sync) 

0.5 

0.8 

10.6 

Mixer  (AFC) 

2 

4 

3.2 

6.4 



42.7 

85.3 

♦Indicated  processing  applies  for  a coding  scheme  with  3-fold 
redundancy  and  25%  guard  interval  per  frequency  hop.  Results 
should  be  scaled  by  1/4  to  obtain  values  for  a non-redundant 
transmission  scheme  without  guard  time.  Per  channel  processing 
is  shown  for  75  b/s  channels. 
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require  one  real  multiply  and  one  real  addition  per  symbol.  The 
heterodyner  oscillator  requires  one  full  complex  multiply  per  input 
sample  (4  real  multiplies  and  2 real  additions) . 

5. 1.2. 4 Soft-Decision  Processing 

The  soft-decision  processor  computes  decision  metrics  for  each 
of  the  eight  alternative  signal  hypothesis  on  the  basis  of  the 
received  energy  in  the  eight  tone  locations,  '’’here  are  a wide 
variety  of  ways  that  the  envelope  metrics  can  be  mapped  into  decision 
metrics,  and  it  is  well  known  that  the  optimum  transformation  is 
that  which  produces  the  log-likelihood  ratio.  Unfortunately,  the 
log-likelihood  transformation  is  a function  of  the  channel  noise 
statistics  and  cannot  be  rigorously  defined  for  channels  in  which 
the  noise  characteristics  are  not  known,  as  in  jamming  or  self 
interference  environments.  Under  these  conditions,  a transforma- 
tion is  required  that  is  robust  under  impulsive  noise  conditions 
and  still  provides  near  optimum  performance  for  white  Gaussian 
noise  channels. 

One  such  transformation  that  has  been  proven  very  effective  in 
this  regard  is  envelope  rank  listing.  In  this  process,  the  envelopes 
associated  with  each  of  the  eight  signal  alternatives  are  compared, 
and  each  of  the  alternatives  is  assigned  a number  from  0 to  7, 
depending  on  the  relative  rank  of  its  envelope  as  compared  with  a]l 
other  alternatives.  Thus,  the  largest  envelope  is  assigned  a value 
7 and  the  smallest  a value  of  0.  Each  of  the  numbers  0-7  is 
assigned  to  an  alternative,  and  no  two  alternatives  receive  the 
same  number.  The  process  as  described  can  result  in  8 1 = 40,320 
different  assignment  combinations.  It  is  impractical  to  directly 
test  all  of  these  possibilities  for  each  soft  decision,  so  that  a 
more  systematic  method  of  arriving  at  the  result  with  minimum 
arithmetic  operations  is  required. 
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One  efficient  algorithm  for  accomplishing  the  rank  listing  is 
based  on  systematic  merging  of  two  rank  listed  chains  so  that  the 
merged  chain  is  also  rank  listed.  The  process  is  illustrated  in 
Figures  5-5  and  5-6.  The  process  starts  by  comparing  the  leading 
members  of  each  chain.  The  contender  with  the  greatest  magnitude 
(say  A^)  is  selected  as  the  highest  ranked  member  of  merged 
chain  C.  With  this  procedure,  two  chains  of  length  4 require  a 
total  of  6 passes,  and  the  final  chain  of  length  8 requires  7 
passes.  The  whole  process  then  requires  17  passes  which  is 
equivalent  to  170  instruction  executions.  Allowing  30  executions 
for  initialization  and  envelope  calculation  (e.g.,  by  truncation 
the  real  and  imaginary  terms  to  four  bits  each  and  using  a table 
look-up)  gives  a total  of  200  equivalent  additions  per  soft 
decision. 

Table  5-6  shows  the  processing  requirements  for  soft-decision 
detection  for  synchronized  and  unsynchronized  MFSK  channels.  The 
instructions  executed  are  a mix  of  arithmetic  and  conditional  jump 
operations,  but  no  multiplications  are  involved  so  the  processing 
load  would  be  equivalent  to  the  indicated  number  of  additions. 

Chip  combining  is  accomplished  bv  summing  the  soft-decision 
metrics  over  a number  of  chip  intervals  corresponding  to  the  chip 
locations  where  the  same  symbol  is  repeated.  If  the  sof t-doci r i on 
metrics  were  truly  log-likelihood  ratios,  then  the  combined  output 
would  correspond  to  the  log-likelihood  ratio  for  the  combined  chips. 
The  rank-list  soft-decision  process  produces  3-bit  numbers  that  are 
an  approximation  to  the  true  log-likelihood  ratio. 

Combining  the  soft-decision  metrics  from  four  chips  produces 
numbers  that  range  over  the  values  0 < Us  28,  so  that  the  output  of 
the  chip  combiner  must  be  represented  by  a five-bit  number  to  avoid 
further  loss  of  information  content  contained  in  the  envelope 


TABLE  5-6 

MFSK  SOFT  DECISION  DETECTION 


INSTRUC- 

PROCESSING 

PROCESSING 

TIONS 

OPERATIONS 

OPERATIONS 

CHANNEL 

PER 

SYMBOL 

PER  CHANNEL 

PER  BIT 

TYPE 

SYMBOL 

RATE 

(KOPS/SEC) 

(KOPS) 

Fully 
Synchro- 
nized MFSK 

200 

200* 

50* 

0.53* 

Unsynchro- 
nized MFSK 

200 

100** 

20** 

0.27** 

♦Indicated  processing  applies  to  chip  combining/coding  arrangement 
with  8-fold  redundancy.  For  non-redundant  transmission  schemes 
operations  should  be  scaled  by  1/8.  Per  channel  processing  is 
shown  for  75  b/s  channels. 

♦♦Indicated  processing  applies  for  a coding  scheme  with  3-fold 
redundancy  and  25%  guard  interval  per  frequency  hop.  Results 
should  be  sealed  by  1/4  to  obtain  values  for  a non-redundant 
transmission  scheme  without  guard  time.  Per  channel  processing 
is  shown  for  75  b/s  channels. 
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amplitudes  in  the  four  chips  composing  the  combined  output.  The 
chip  combining  process  requires  only  8 additions  per  chip  interval, 
which  is  a negligible  contribution  to  the  overall  demodulation 
process . 

5. 2 DEMODULATOR  PROCESSOR  IMPLEMENTATION 

The  Demodulator  processor  postulated  for  the  WBSP  can  be 
regarded  as  two  core  processors,  configured  to  run  in  parallel, 
combined  with  a 512-word  by  22-bit  sample-point  memory.  (The  reason 
for  this  22-bit  memory  width  is  explained  below.)  In  addition,  each 
core  processor  is  provided  with  an  8 x 8 parallel  multiplier. 

(It  is  anticipated  that  a currently  existing  multiplier  chip  can 
be  used  for  this  purpose.)  Th«  core  processor  pair,  however,  is 
less  complex  than  two  independent  processors  for  two  reasons: 

1)  They  share  the  same  bus  interface.  2)  They  perform  essentially 
the  same  operations  at  the  same  time;  thus,  most  fields  of  the 
control  RAM  can  be  shared.  Specifically , the  40-bit  width  control 
RAM  needed  for  a single  processor  is  expanded  to  56  bits  for  the 
dual  processor. 

The  sample-point  memory  is  organized  into  512  words,  each 
word  divided  into  a 9-bit  real  and  a 9-bit  imaginary  part  to  which 
is  appended  an  overall  parity  bit.  Three  spare  bits  and  a bit 
rippler  are  added  for  fault  tolerance. 

The  9-bit  word  length  was  chosen  on  the  basis  of  a quantiza- 
tion noise  simulation  described  in  Appendix  A.  It  was  found  in  that 
simulation  that  the  signal-to-quantization-noise  ratio  associated 
with  FFT  processing  ranged  between  33  and  40  dB  under  a wide 
variety  of  conditions  when  the  data  were  truncated  to  8 bits,  a 
ratio  felt  to  be  entirely  acceptable.  The  quantization  noise  was 
kept  to  that  level  by  dividing  every  data  value  by  two  after  each 
FFT  iteration  if,  and  only  if,  at  least  one  butterfly  operation 
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resulted  in  an  overflow.  (Note  that  a butterfly  operation  can  at 
most  double  the  magnitude  of  its  inputs.)  This  in  effect  requires 
the  data  to  be  represented  in  a floating-point  format  with  a single- 
bit exponent  used  to  indicate  whether  or  not  an  overflow  occurred 
during  the  previous  iteration.  (An  alternative  method  would  be  to 
access  all  previously  processed  data  and  divide  each  data  point 
by  two  as  soon  as  the  first  overflow  occurred  during  any  iteration, 
an  obviously  time-consuming  procedure.)  Accordingly,  9 bits  of 
memory  are  allowed  for  each  data  point,  with  the  9th  bit  used  to 
store  the  associated  overflow  bit. 

The  memory  control  chip  described  in  Section  4 can  be  used 
to  control  the  sample-point  memory  as  well.  The  address  shuffler 
is  readily  modified  to  implement  the  much  simpler  addressing  sequence 
required  for  an  FFT  and  most  of  the  rest  of  the  logic  in  that  chip 
is  applicable  as  well.  In  addition,  a "normalizer"  can  easily  be 
integrated  onto  that  chip,  so  that  the  divide-bv-two  operation  can 
occur  automatically,  as  the  data  are  passed  to  the  processor, 
whenever  an  overflow  occurred  during  the  previous  FFT  iteration. 

The  Demodulator  processor  operates  in  two  phases  when  demodula- 
ting MFSK  data.  (The  much  simpler  QPSK  cross-link  data  demodulation 
process  can  be  implemented  using  a single-phase  procedure.)  During 
phase  1,  the  processor's  sample-point  memory  is  loaded  with  the 
data  samples  taken  during  the  current  symbol  interval . At  the 
same  time,  soft-decision  outputs  are  calculated  on  the  basis  of 
the  previously  transformed  samples  and  transferred  to  the  Inter- 
leaver/De-interleaver  processor.  The  data  stored  during  phase  1 
are  then  processed  in  accordance  with  the  algorithm  described  in 
paragraph  5.2  (except  for  the  soft-decision  portion  of  that 
algorithm)  during  phase  2 . The  dual-processor  accepts  both  the 
real  and  imaginary  portions  of  two  sample-point  memory  outputs. 
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performs  the  appropriate  butterfly  operation,  and  sends  the  results 
back  to  the  sample-point  memory  to  be  stored  for  the  next  iteration. 

Two  Demodulator  processors  are  required  for  each  FDM  MFSK 
channel,  one  implementing  phase  1 while  the  other  implements 
phase  2.  The  two  processors  switch  roles  after  each  symbol  inter- 
val. To  verify  that  a pair  of  such  processors  can  indeed  implement 
the  algorithms  described  in  paragraph  5.2,  recall  that,  when 

= 16,  asynchronous  demodulation  (the  more  complex  case)  requires 
a total  of  71,400  equivalent  additions  and  54,000  multiplications 
per  second  per  channel  to  implement  all  of  the  phase  2 operations. 
(Many  fewer  operations  are  required  during  phase  1.)  Since  these 
numbers  are  based  on  fourteen  channels  (and  are  not  substantially 
reduced  if  only  twelve  channels  are  present) , approximately  106 
equivalent  adds  and  3/4  x 10**  multipliers  are  needed  each  second 
to  demodulate  one  asynchronous* FDM  MFSK  signal.  If  the  symbol 
interval  is  halved  (to  5 msec),  these  rates  are  essentially  doubled. 
Since  the  processor  contains  hardware  multipliers,  and  additions 
and  multiplications  both  require  the  same  amount  of  time  (one 
micro-cycle) , the  total  number  of  operations  needed  per  second  at 
the  5 msec  symbol  rate  is  thus  3.5  x 10**.  This  number  must  be 
increased  by  a factor  of  approximately  two  to  account  for  the  time 
needed  to  access  and  store  the  data,  for  interation  loop  overhead, 
etc.  Thus,  the  Demodulator  processor  must  be  able  to  execute 
microinstructions  at  a 7 x 10**  instruction/second  rate  if  it  is  to 
handle  an  asynchronous  MFSK  FDM  channel  in  the  manner  described 
here.  Since  a core  processor  can  execute  microinstructions  at 
a 6.67  x 10^  instruction/second  rate  and  since  a Demodulator  pro- 
cessor consists  of  two  core  processors  operating  in  parallel,  it 
can  clearly  implement  the  required  processing  algorithm  with  a 
comfortable  margin. 
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6.0  RELIABILITY,  POWER,  WEIGHT,  VOLUME  AND  THROUGHPUT  SUMMARY 

6.1  WBSP  SIZING 

Table  6-1  lists  the  functional  assignments  of  the  various 
Demodulator,  Inter leaver/De-inter leaver , and  Encoder/Decoder 
processors  needed  to  configure  the  WBSP  for  anticipated  SSS 
applications . The  number  and  types  of  LSI  devices  used  to  imple- 
ment each  of  these  three  processor  types  are  also  summarized  in 
Table  6-1. 

6.2  FTSC  SIZING 

The  FTSC  is  modular  in  design  so  that  the  number  of  memory 
blocks  and  I/O  ports  can  be  tailored  to  each  specific  application. 

As  indicated  in  Figure  2-1,  two  I/O  ports,  one  parallel  and  one 
serial  port,  are  postulated  for  the  SSS  configuration.  It  is 
estimated  that  three  active  4096-word  memory  blocks  are  required 
to  store  all  the  program,  constants  and  variable  data  needed  for 
the  SSS  mission.  This  estimate  was  based  on  a detailed  examination 
of  the  memory  requirement  associated  with  each  of  the  tasks  to  be 
performed.  The  results  of  this  examination  are  summarized  in  Table 
6-2. 

Also  shown  in  Table  6-2  is  the  maximum  percentage  of  the  FTSC's 
total  throughput  required  to  perform  each  of  the  identified  SSS 
tasks.  As  can  be  seen,  the  presently  identified  tasks  leave  the 
FTSC  with  a throughput  margin  adequate  for  future  growth. 

6.3  RELIABILITY  ANALYSIS 

Reliability  analyses  were  conducted  for  both  the  WBSP  and  the 
FTSC.  For  purposes  of  these  analyses,  an  LSI  device  hazard  rate 
of  10  failures  per  hour  was  postulated  and  a dormancy  factor  (ratio 
of  active  to  dormant  hazard  rates)  of  1.1  was  used  where  applicable. 
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TABLE  6-1 


PROCESSOR  COMPLEXITY  AND  UTILIZATION  SUMMARY 


b)  In ter lever/De- in ter leaver  Processor 
COMPOSITION 

Element 

Core  Processor 
Inti.  Mem.* 

Memory  Control 
Bit  Rippler 

TOTAL 

UTILIZATION 

Report  Back  De-Intl . 

FWD.  Link  De-Intl. 

EHF . Link  De-Intl. 

Interleaver 

TOTAL  NEEDED 


No . of 
Devices 


21 

84 

1 

2 

108 


4 

3 

1 

1 

9 


*3096  words  of  24  information  bits,  1 parity  bit  and 
3 spare  bits  each:  84  1024  x 1 RAM  chips. 
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TABLE  6-1  (Cont.) 


c)  Demodulator  Processor 


COMPOSITION 


Element 


No.  of 
Devices 


Core  Processor 
2nd  RALU 

2nd  I/O  Buffer  and  Control  Sequencer 

2nd  Data  RAM 

Dual  Multipliers 

Expanded  Control  RAM 

Sample  Point  Memory* 

Memory  Control 
Bit  Rippler 

TOTAL 

UTILIZATION 

Report  Back  Demod 
FWD  and  EHP  Demod 
Cross-Link  Demod 


TOTAL  NEEDED 


21 

2 

1 

2 

2 

4 

11 

1 

_1 

45 


2 

2 

1 


*512  words  of  18  information  bits,  1 parity  bit  and 
3 spare  bits  each. 
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TABLE  6-2 

FTSC  LOADING  FOR  SSS 


RAYTHEON  COMPANY 
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*<}UIPNirNT  DIVISION 

The  predicted  7-year  reliability  of  the  proposed  WBSP  con- 
figuration is  .9654.  The  basis  for  this  prediction  is  shown  in 
Table  6-3  which  lists  the  complexities  and  redundancies  for  each 
of  the  elements  comprising  the  WBSP  as  well  as  the  resulting  7- 
year  reliability  of  each  element.  The  complexity  of  each  element 
is  indicated  in  terms  of  the  number  of  eauivalent  LSI  devices. 

(The  hazard  rate  is  therefore  equal  to  the  complexity  multiplied 

-7 

by  10  .)  The  complexity  of  an  element  may  differ  slightly  from 

its  actual  device  count  due  to  interconnection  complexity  factors 
and  due  to  the  fact  that  a portion  of  the  failures  occurring  at  the 
bus  interfaces  appear  as  bus  (byte)  failures  rather  than  failures 
in  the  element  associated  with  that  interface. 

The  7-year  reliability  of  the  FTSC  is  predicted  to  be  .9592. 
This  prediction  is  based  on  the  reliability  model  used  for  all 
FTSC  reliability  calculations  but  with  the  following  exceptions: 
six,  rather  than  four,  CPUs  were  used  in  order  to  increase  the 
mission  duration  from  five  to  seven  years;  the  second  DMA  port  was 
eliminated,  since  no  need  for  it  has  been  established. 

6.4  RELIABILITY,  POWER,  WEIGHT  AND  VOLUME  SUMMARY 

The  estimated  power  requirements  of  the  various  WBSP  elements 
are  listed  in  Table  6-3.  The  estimated  power  consumption  of  the 
FTSC  in  the  configuration  proposed  here  is  21.2  watts. 

Weight  and  volume  estimates  for  the  WBSP  were  made  under  the 
assumption  that  it  would  use  the  packaging  techniques  developed 
for  the  FTSC.  The  smaller  (4.3"  x 10.0  " ) of  the  two  FTSC 
modules  was  assumed  for  all  WBSP  elements;  the  number  of  modules 
needed  to  accommodate  each  of  the  WBSP  elements  under  these  con- 
ditions is  also  indicated  in  Table  6-3.  The  FTSC  weight  and  volume 
estimates  were  extracted  from  results  derived  during  that  program. 
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TABLE  6-3 

WBSP  RELIABILITY  AND  POWER  SUMMARY 


Element 

Complexity 

# Avail./ 

# Needed 

Reliability 
(7  yrs. ) 

# Modules 
Required 

Power 

Required 

Bus  Controller 

34.5 

2/1 

.9946 

1 

1.9 

Bus  1 

2.63 

3/2 

.9992 

- 

- 

Bus  2 

3.13 

3/2 

.9989 

- 

- 

Bus  3 

2.75 

3/2 

.9992 

- 

- 

Bus  4 

2.0 

3/2 

.9996 

- 

- 

Demodulator 

44.5 

9/5 

.9943 

4 1/2 

3.8 

Deinterleaver 

108.0 

14/9 

.9922 

14 

8.3 

Decoder 

20.5 

6/3 

.9976 

1 1/2 

2.5 

A/D  Conv. 

6.0 

(2/1) 5 

.9935 

1 

3.0 

Modulator 

4.6 

(2/i)4 

. 9969 

0.9 

J .2 

Timing  Generator 

3.5 

2/1 

.9995 

0.1 

0.4 

Power  Supply 

9.8 

2/1 

.9966 

3.7 

TOTAL  REL.  - 

— .9627 

TOTAL  « 

MODULES  

23 

TOTAL  POWER  REQUIREMENTS  

24.8  Watts 

i 
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The  results  of  these  calculations  are  summarized  in  Table  6-4. 
The  first  two  rows  in  that  table  give  parameters  for  the  WfiSP  and 
the  FTSC  when  the  two  units  are  packaged  individually.  The  third 
row  gives  the  same  parameters  for  the  WBSP  and  FTSC  packaged  as  one 
unit.  If  they  are  to  be  packaged  individually,  the  reliability  and 
power  totals  remain  unchanged  but  the  weight  and  volume  of  the  two 
individual  units  are  just  the  sums  of  the  values  shown  in  the  first 
and  second  rows . 


RELIABILITY,  POWER,  WEIGHT,  AND  VOLUME  SUMMARY 


7 yr.  Reliability 

Power 

(watts) 

Weight 

(lbs) 

Volume 

(ft3) 

- 

WBSP 

.9627 

i 

24.8 

36.0 

0.52 

— 

FTSC 

.9592 

21.2 

00 

V 

0.65 

— 

TOTAL 

.9234 

46.0 

67.7 

1.05 

7.0  CONCLUSIONS  AND  RECOMMENDATIONS 

The  results  of  the  WBSP  study  thus  far  have  demonstrated  that 
the  fault  tolerance  techniques  developed  for  the  FTSC  can  also  be 
used  effectively  to  configure  a highly  reliable  communications 
processor  of  the  sort  needed  for  SSS.  The  same  structure  that 
gives  the  signal  processor  a high  reliability-improvement-to- 
redundancy  ratio  (i.e.,  its  multiple,  reassignable  processor 
configuration)  also  makes  it  unusually  versatile.  Coding  or  modula- 
tion techniques  can  be  changed,  frame  lengths  can  be  modified,  and 
channels  can  be  added  or  deleted  without  changing  the  WBSP  hardware 
design;  generally,  such  changes  can  be  accommodated  by  making  soft- 
ware changes  and  by  adding  or  removing  some  number  of  processing 
elements . 

The  cost  of  developing  the  proposed  WBSP  should  be  much  lower 
than  normal  for  a system  of  this  complexity  since  so  much  of  the 
work  is  already  being  done  for  the  FTSC.  All  but  three  of  the 
LSI  devices  identified  for  the  WBSP  are  being  developed  for  the  FTSC. 
These  devices  will  be  subjected  to  extensive  reliability  and  hard- 
ness testing  as  part  of  the  FTSC  program,  thus  providing  the  data 
base  needed  for  the  WBSP  as  well.  Moreover,  all  of  the  difficult 
design  problems  associated  with  a fault-tolerant  system  (e.g., 
fault  detection  and  isolation)  have  already  been  addressed  in  the 
FTSC  design.  The  solutions  developed  there  (e.g.,  the  bus  structure) 
are  being  used  intact  in  tlvj  WBSP;  thus,  the  WBSP  development  effort 
can  concentrate  on  the  more  tractable  problem  of  designing  the 
individual  processors  themselves. 

The  proposed  structure  for  the  WBSP  has  been  defined  during  the 
course  of  the  present  study,  and  reasonably  detailed  designs  have 
been  developed  for  each  of  the  three  processor  types  needed  for  its 
implementation.  Considerable  work  remains  to  be  done,  however,  to 
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demonstrate  the  concept  throughly.  To  this  end,  the  following 
two-phase  effort  is  proposed: 


Phase  I 

DESIGN  AND  VERIFICATION 

Task  1: 

Define  remaining  microroutines: 

a) 

Synchronous  FDM  MFSK  demodulator 

b) 

Asynchronous  FDM  MFSK  demodulator 

c) 

TDM  QPSK  demodulator 

d) 

De-inter leaver 

e) 

Interleaver 

f) 

Reed-Solomon  Decoder 

These  microroutines  should  be  defined  in  a manner  similar  to 
that  described  in  Section  3;  register  utilization  and  scratch- 
pad memory  partitioning  must  also  be  determined. 

Task  2:  Complete  detailed  LSI  chip  design: 

a)  I/O  Buffer  and  Control  RAM  Sequencer 

b)  Memory  Control 

Both  these  chips  have  been  defined  functionally;  this  functional 
description  must  now  be  translated  into  detailed  logic  designs. 

Task  3:  Design  Bus  Controller 

The  Bus  Controller  design  should  be  relatively  straightforward 
compared  to  the  processor  designs  comprising  the  bulk  of  the  effort 
in  the  present  study.  Nevertheless,  the  Bus  Controller  design  must 
obviously  be  completed  if  the  WBSP  is  to  function  as  a unit.  It 
would  appear  that  the  Memory  Control  chip  to  be  designed  in  Task  2 
could  easily  be  used  to  control  the  Bus  Controller  memory  as  well; 
this  possibility  should  be  investigated. 

| 
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Task  4:  Design  A/D  Converter  Module 

The  A/D  Converter  Module  postulated  for  the  WBSP  consists  of  a 
relatively  low-speed  (<100  KHz)  converter  coupled  with  bus  inter- 
face logic.  A monolithic  A/D  converter  apparently  suitable  for 
this  purpose  has  already  been  identified  so  this  design  should 
pose  no  serious  problems.  An  investigation  should  be  made  into  the 
advisability  of  preceding  the  converter  with  a prescaler,  so  that 
the  signal  gain  can  be  changed  under  demodulator  control . 

Task  5:  Design  Modulator  Module 

The  Modulator  Module  postulated  for  the  WBSP  simply  translates 
each  digital  symbol  received  over  the  WBSP  bus  into  one  of  four 
(QPSK)  or  eight  (MFSK)  discrete  levels  defining  the  subcarrier 
phase  or  frequency  appropriate  for  each  FDM  or  TDM  channel . The 
design  of  such  a device  is  obviously  straightforward.  It  is,  of 
course,  possible  to  design  the  modulator  to  generate  the  sub- 
carriers directly,  and  hence  to  reduce  the  complexity  of  the  sub- 
sequent analog  hardware  at  the  cost  of  increased  modulator 
complexity.  Either  special-purpose  or  general-purpose  digital 
modulators  could  be  used  in  this  capacity;  in  the  latter  case,  a 
modulator  pooled-spares  configuration  with  an  analog  multiplexer 
between  the  modulators  and  the  r.f.  system  becomes  an  attractive 
possibility.  The  desirability  of  such  design  modifications  should 
be  examined. 

Task  6:  Design  Verification 

It  is  advisable,  when  dealing  with  a system  as  complex  as  the 
WBSP,  to  verify  that  the  design  is  correct  before  committing  it  to 
hardware.  One  method  for  doing  this  is  to  simulate  the  design  on  a 
computer.  Such  a simulation  was  in  fact  written  for  the  basic 
(Encoder/Decoder)  processor  during  the  present  study  using  a 
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higher-order  simulator  language  called  CDL  (computer  design 
language) . The  product  of  this  effort  is  described  in  Appendix 
B.  It  is  proposed  that  this  simulation  be  extended  to  include 
the  Demodulator  and  Interleaver/De-interleaver  processors  as  well 
and  that  extensive  runs  be  made  to  verify  that  all  of  the  processing 
algorithms  perform  as  required. 

Phase  II  FABRICATION  AND  TEST 

Task  7:  Brassboard  Fabrication 

Once  the  design  has  been  completed  and  its  correctness  verified,  _ 
it  is  proposed  that  a brassboard  be  constructed  and  used  to  test 
and  demonstrate  the  WBSP  concept.  Initially,  only  one  of  each  of 
the  major  WBSP  elements  (Bus  Controller,  A/D  converter.  Demodulator, 
De-interleaver,  Decoder  and  Modulator)  would  be  needed.  This 
"single-string"  brassboard  would  be  sufficient  to  demonstrate  the 
performance  characteristics  of  the  WBSP  as  a communications  pro- 
cessor; all  of  the  processing  algorithms  could  be  demonstrated 
individually  and  in  the  appropriate  combinations.  It  would  be 
desirable,  however,  eventually  to  add  the  multiple  elements  needed 
to  demonstrate  a full-scale  WBSP  configuration  with  all  the  various 
signal  processing  tasks  taking  place  simultaneously.  Such  a full- 
scale  brassboard  could  then  be  coupled  with  the  FTSC  (the  existing 
FTSC  brassboard  is  recommended  for  this  purpose)  in  order  to  test 
and  demonstrate  the  integrated  WBSP-FTSC  system. 


It  should  be  noted  that  a WBSP  brassboard  could  be  constructed 
quite  economically  by  again  taking  advantage  of  the  FTSC  program. 

An  engineering  test  model  (ETM)  of  the  FTSC  is  currently  under 
development.  The  ETM  will  be  constructed  from  logic  cards  imple- 
mented with  existing  SSI  and  MSI  logic.  These  cards,  however,  will 
be  functionally  identical  with  the  LSI  devices  being  developed  for 
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the  FT SC  and  will  be  interchangeable  with  them.  Since  the  WBSP 
is  designed  to  use  these  same  LSI  devices,  most  of  the  WBSP  brass- 
board  can  be  constructed  simply  by  making  extra  copies  of  the  cards 
already  being  fabricated  for  the  ETM.  Such  a procedure  will  obviously 
reduce  significantly  the  time  and  expense  needed  to  construct  a 
WBSP  brassboard. 

Task  8:  Test  and  Demonstration 

The  purpose  of  this  task  is  to  carry  out  the  performance  tests 
referred  to  in  the  previous  task  description.  The  single-string 
brassboard  would  be  used  to  obtain  communication  system  performance 
parameters  (e.g.,  bit-error  rate  vs.  signal-to-noise  and  signal-to- 
jammer  ratios  for  all  modulation  and  coding  schemes  of  interest.) 

The  full-scale  brassboard  would  be  used  primarily  to  test  and 
demonstrate  the  WBSP's  ability  to  handle  multiple  channels,  the 
FTSC's  routing  and  fault-monitoring  capability  and  the  integrated 
system's  tolerance  to  faults. 
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FFT' PROCESSING  AND  QUANTIZATION  NOISE 
A . 1 INTRODUCTION 

Processor  complexity  and  signal  processing  speed  are  directly 
related  to  the  length  of  the  registers.  Due  to  the  inherently  non- 
linear nature  of  fixed-point  arithmetic  implemented  with  finite- 
length  registers,  however,  the  resulting  quantization  errors  are 
difficult  to  estimate  analytically.  Although  a linear  quantization 
noise  model  can  be  used,  the  results  obtained  from  it  are  often 
unreliable  due  to  the  fact  that  quantization  noise  depends  heavily 
on  the  dynamic  nature  of  the  input  signal.  In  general,  it  is  not 
adequate  to  assume  that  quantization  noise  is  independent  of  the 
input  signal,  and  as  a result  of  the  cross-correlation  between  the 
input  signal  and  the  quantization  noise,  predication  of  the  induced 
noise  level  is  a difficult  task. 

Accordingly,  a computer  simulation  program  was  written  so  that 
the  effect  of  performing  FFTs  usinq  finite-length  registers  could  be 
evaluated.  Noise  levels  were  measured  by  applying  random  input 
signals  and  parametric  curves  were  obtained  relating  the  noise  and 
signal  levels. 

The  FFT  algorithm  assumed  for  purposes  of  this  discussion  is 
the  conventional  "decimation-in-frequency"  FFT  performed  using  b-bit 
data  words.  It  is  further  assumed  that  sealing  is  accomplished 
dynamically:  that  is,  when  an  overflow  occurs  in  an  iteration,  the 

entire  data  base  is  scaled  by  1/2  and  the  iteration  is  continued  at 
the  point  at  which  the  overflow  occurred . 

The  assumed  quantization  function  is  given  by  the  following 
expression  t.. f * _ . ,,b-l  _ . .1 


Integer  Part  of  — • (213  1 - 1) 
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where  x is  the  voltage  to  be  quantized  and  x 
A/D  range,  taken  arbitrarily  to  be  10  volts 
1 for  the  sine  and  cosine  tables. 

A. 2 SIMULATION  RESULTS 

A. 2.1  BANDLIMITED  GAUSSIAN  NOISE  PLUS  TONE 

In  this  simulation,  a 32-point  FFT  was  taken  of  a single  tone 
signal  plus  Gaussian  noise.  The  tone  (in  frequency  bin  4)  had  a 
5-volt  peak  amplitude  and  the  noise  had  an  rms  value  of  1 volt  in 
all  frequency  bins  except  bins  13  through  18.  These  bins  were 
left  empty  in  order  to  enable  spectral  leakage  measurements.  The 
results  of  this  simulation  are  shown  in  Figures  A-l  through  A-6 
for  data  word  lengths  b varying  from  16  bits  to  8 bits.  A "per- 
fect" (48-bit)  transform  was  also  made  for  reference  purposes 
(Figure  A-l).  These  results  are  summarized  in  Figure  A-7. 

A. 2. 2 FDM  SIGNALS 

Several  simulations  were  made  to  determine  interchannel  inter- 
ference effects  caused  by  data  word  truncation.  These  simulations 
modeled  m 8-ary  FSK  channels  with  m = 8,  16,  32,  64.  The  active 
tone  in  each  channel  and  its  phase  (any  one  of  8 equally-spaced 
values)  were  chosen  using  a pseudo-random  sequence.  The  spectrum 
produced  by  taking  an  N = Sm-point  FFT  of  the  composite  signal  was 
then  determined.  The  results  for  m = 64  are  shown  in  Figure  A-8 
through  A-13  for  various  data  word  lengths  b.  These  results 
along  with  similar  results  for  m = 8,  16  and  32  are  summarized  in 
Figure  A-14  which  shows  the  signal-to-quantization-noise  ratio  as  a 
function  of  N and  b.  For  this  simulation,  the  active  tones  were 
each  assumed  to  have  an  amplitude  of  10/64  volts  and  the  quantiza- 
tion noise  was  averaged  over  all  frequency  bins.  Figure  A-15  shows 
the  same  results  when  the  composite  signal  consists  of  equal- 
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Figure  A-2  TONE  PLUS  GAUSSIAN  NOISE  SPECTRUM  (b=16) 
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Figure  A- 4 TONE  PLUS  GAUSSIAN  NOISE  SPECTRUM  (b=12) 
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amplitude  signals  having  a combined  rms  value  of  1.25  volts  (so 
that  the  individual  signal  amplitudes  decrease  as  m increases) . 

Figure  A-16  shows  the  results  of  the  simulation  when  the 
composite  signal  consists  of  64  8-ary  tones  (n  = 512)  , as  before, 
with  each  tone  having  a 10/64  volt  amplitude,  but  now  a tone  jammer 
occupies  one  frequency  bin  with  a signal  having  a 100/64  volt 
amplitude.  An  8-bit  word  length  was  used  for  this  simulation.  As 
can  be  seen,  the  strong  coherent  signal  tends  to  suppress  some  of 
the  quantization  noise  terms  while  creating  strong  harmonics. 
Nevertheless,  the  average  noise  level  is  still  about  33dB  below  the 
signal  level. 

Finally,  Figure  A-17,  which  shows  the  results  of  the  N = 512 
(m  = 64)  point  FFT  simulation  when  the  rms  signal  level  is  varied 
from  0.25  volts  to  7 volts,  demonstrates  that  the  signal-to- 
quantization-noise  ratio  is  relatively  independent  of  the  siqnal 
amplitude . 

A. 3 CONCLUSIONS 

On  the  basis  of  these  investigations,  it  can  be  tentatively 
concluded  that  an  8-bit  data  word  is  ddequate  for  SSS  demodulator 
processing  tasks.  The  signal-to-quantization  noise  ratios  ranged 
from  33  to  40  dB  for  the  cases  examined  here.  Presumably,  the 
signal-to-thermal-noise  ratios  will  typically  be  in  the  0 to  10  d3 
range.  Consequently,  the  quantization  noise  caused  by  restricting 
the  processor  to  8-bit  word  lengths  does  not  appear  to  make  a 
significant  contribution  to  the  overall  noise  level. 
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APPENDIX  B 

CORE  PROCESSOR  CDL  DESCRIPTION 
B.l  INTRODUCTION 

In  order  to  facilitate  both  its  evaluation  and  that  of  the  micro- 
routines to  be  executed  on  it,  the  core  processor  was  modeled 
using  computer  design  language  (CDL) . Due  to  CDL  limitations, 
some  interim  holding  registers  and  additional  clocking  had  to  be 
added  to  the  model.  Nevertheless,  the  CDL  model  is  a functionally 
exact  model  of  the  core  processor.  The  core  processor  can  execute 
an  ALU  operation  in  one  clock  cycle  and  an  I/O  or  memory  reference 
instruction  in  two  cycles.  The  CDL  model  can  effectively  do  the 
same  but  up  to  six  subcycles  may  be  used  during  a single  master- 
clock  cycle. 

B.2  CDL  ARCHITECTURE 

The  CDL  model  is  broken  down  into  two  basic  sections.  The 
first  is  the  DECLARATION  section  in  which  all  the  hardware  for  the 
processor  is  defined.  The  second  section  is  the  LABELED  statement. 
These  statements  define  the  buses  and  system  flow  of  the  processor. 

It  is  in  this  second  section  that  the  clock  labels  are  used.  They 
indicate  what  information  is  to  be  on  what  bus  at  what  time.  In 
this  particular  model,  six  cycles  are  actually  involved  in  an  ALU 
function  and  one  cycle  is  used  to  test  if  the  proper  number  of 
iterations  has  been  completed. 

B .3  CDL  MODEL  DESCRIPTION 

The  first  clock  cycle,  /P*START/,  takes  the  contents  specified 
by  the  control  RAM  address  register,  divides  these  40  bits  up  into 
the  proper  CONTROL  RAM  fields,  and  then  loads  them  into  the  proper 
registers . 
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After  the  first  clock  cycle  is  completed,  two  Major  Labels 
are  evaluated  (/=RF14 .EQ . 7=/,  /=RF14 .LT . 6=/) . If  the  first 
condition  is  true,  an  ALU  function  is  specified  and  the  following 
procedure  is  executed: 

The  proper  format  for  the  information  transfer  from  the  ANS 
and  WRYREG  registers  is  specified  during  the  second  clock  cycle 
/ CLK ( 0 ) / . This  information  is  stored  in  temporary  holding  registers 
denoted  GPREG  and  WKREG. 

In  the  third  clock  cycle,  /CLK(l)/,  a write  clock  is  checked  on 
both  the  working  (WKREG)  and  general-purpose  (GPREG)  registers. 

If  either  is  high,  the  proper  register  input  is  selected  and  the 
contents  of  GPREG  or  WKREG  are  stored  into  the  specified  location. 
The  working  and  general-purpose  register  arrays  are  each  modeled 
as  an  8 x 8 memory  array. 

During  the  fourth  cycle,  /CLK(2)/,  the  ALU  inputs  are  selected 
and  stored  in  temporary  holding  registers  denoted  ALUA  and  ALUB . 

The  specified  ALU  function  is  then  executed  during  the  next 
cycle,  /CLK (3)/,  and  the  result  stored  in  a nine-bit  register  denoted 
AMS. 

The  second  major  label,  /=RF14 .LT . 6=/,  indicates  a memory  or 
I/O  operation.  In  this  case,  a further  check  is  made  to  determine 
whether  an  I/O  or  memory  reference  is  specified.  In  the  case  of  an 
I/O  operation,  a further  determination  is  made  to  see  if  it  is  an 
input  or  output  operation  and  either  the  contents  of  the  input 
register,  INREG,  are  transferred  to  ANS  or  the  contents  of  ANS  are 
transferred  to  the  output  register,  OUTREG.  If  a memory  reference 
is  specified,  the  store/fetch  control  RAM  field  is  checked  and  the 
scratch-pad  RAM  address  is  defined.,  depending  upon  another  RAM 
control  field,  either  directly  by  the  contents  of  one  of  the 

B2 


Mfli 


RAYTHEON  COMPANY 

^RAYTHEO^ 

EQUIPMENT  DIVISION 

working  registers  or  by  these  contents  incremented  by  32  or  64. 

Once  the  proper  address  and  store  or  fetch  decisions  have  been  made, 
the  data  are  transferred. 


At  this  point,  both  sections,  the  ALU  function  and  the  I/O  or 
memory  request,  merge.  The  following  instructions  are  common  to 
both  operations: 

In  the  last  CPU  cycle,  /CLK(4)/,  the  branch  variable  to  be 
used  in  the  next  CPU  cycle  is  checked  and  the  next  control  RAM 
address  is  created. 


There  is  one  more  label  statement,  /CLK(5)/,  during  which  a 
check  is  made  to  determine  if  the  proper  number  of  iterations  has 
been  completed.  This  label  has  no  equivalent  in  the  hardware  and 
simply  is  a software  tool. 


B.4  CDL  SAMPLE  RUN 


The  following  pages  illustrate  a sample  run  of  seven  iterations 
incorporating  both  an  I/O  memory  operation  and  an  ALU  oper.ition. 


Iteration 


1 

2 

3 

4 

5 

6 
7 


Starting  Label 

/P* START/ 

/P* START/ 

/P* START/ 

/P* START/ 

/P* START/ 

/P* START/ 

/P* START/ 


Function 


Input  operation  (005g  from  INREG) 


ALU  function  (ALUA.XOR. ALUB) 


Memory  store  (RAM  ADDRESS  7g) 


ALU  function  (ALUA.OR.ALUB) 

Memory  fetch  (RAM  ADDRESS  047g) 
ALU  function  (ALUA. ADD. ALUB) 
Output  operation  (100g  to  OUTREG) 
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CALL  . CDLP(  INPUT- 1 IIPTST7) 


•TRANSLATE 

C 

c 

c 

•MAIN 

C 

c 

c 

c 


10 

C 

11 

c 

12 

cutmm 

**** 

********** t*tt*t*t*t*****ttt**tt**t***4 

13 

c 

14 

c 

13 

c 

16 

CLOCK. P 

17 

c 

IB 

c 

1? 

c 

20 

ct*t*»***t 

tm 

21 

c 

22 

c 

23 

c 

24 

REGISTER* 

23 

1 

RFK2-0)* 

ft 

CONTROL  RAM  SEOUENCER  HRANCM  ADO 

24 

i 

RF2( 7-0 ) 9 

t 

CONTROL  RAM  NEXT  ADDRESS 

27 

1 

RK3C2-0) * 

i 

WORKING  REG  UX  OUTPUT  SELECT 

28 

1 

RF4C2-0) * 

i 

WORKING  REG  WY  AND  GTNf  RAL  TiJR  REO  RA 

2? 

i 

RFSr 

t 

WORKING  REGISTER  WX/WY  INPUT  SELECT 

30 

1 

RF4(l-0>. 

ft 

4 J 1 MUX  INPUT  FOR  REGISTER  ARRAY 

31 

1 

RF7* 

ft 

WORKING  REGISTER  WRITC  CLOCK 

32 

i 

RF0C2-O)* 

ft 

GENERAL  PUR  REG  RB  OUTPUT  SELECT 

33 

i 

RF9* 

ft 

GENERAL  PUR  REG  RA/RB  INPUT  SELECT 

34 

i 

RF10. 

t 

GENERAL  FUR  REG  WRITE  CLOCK 

33 

i 

RF1 1(2-0). 

ft 

ALU  FUNCTION  SELECT 

34 

i 

RF1 2(2-0) . 

ft 

ALU  A.P  MULTIPLEXER  INPUT 

37 

i 

RF13. 

s 

CARRY  INPUT  INTO  ALU 

38 

i 

RF14 (2-0) » 

ft 

I/O  DUFrER  ♦ MEMORY  REQUEST 

3? 

i 

RFlS(l-O). 

ft 

SPECIAL  P UNC 1 1 ON  DECODE 

40 

i 

RF14 

ft 

REAP/UR  I TE  CONTROL 

41 

REGISTER# 

42 

1 

START. 

ft 

START  BIT  FOR  A RUN 

43 

1 

T(3-0). 

ft 

SEOUENCER  CLOCK 

44 

1 

RUNS < 7-0 )* 

ft 

COUNTS  THE  NUMBER  OF  RUNS 

43 

1 

ANSI 10-0>* 

ft 

HOLDING  REG  FOR  ALU  ♦ I /Op MEM  REQUEST 

46 

1 

URYREG( 7-0) > 

ft 

HOLDING  REG  FOR  UT  OUTPUT 

47 

1 

GPRADD<2-0>  » 

ft 

DUMMY  RFG  NEEDED  TO  SPECIFY  GP  MEMORY 

48 

1 

WKRADD<  2-0 ) » 

s 

DUMMY  REG  NEEDED  TO  SPTCIFY  UK  MEMORY 

4? 

1 

CPRE0(7-O>. 

ft 

MOLDING  REG  TOR  INPUT  TO  GP  MEMORY 

50 

1 

WKREQ ( 7-0 ) 9 

ft 

HOLDING  REG  FOR  INPUT  TO  WK  MEMORY 

SI 

1 

ALUA< 7-0) » 

ft 

A INPUT  INTO  ALU 

32 

1 

ALUB  < 7-0 ) » 

ft 

D INPUT  IN10  ALU 
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89 

90 

91 

92 

93 

94 

95 
94 

97 

98 

99 
100 
101 
102 

103 

104 

105 
104 

107 

108 


S3 

1 

PVRCti(4  0)  t 

I DR  ARC  II  VAR  I ADI  E 

RrniSTI R 

54 

1 

1NREG( 7-0 > » 

t INfUl  DA  1 A 

REGISTER 

53 

1 

OUT REG (7 -0) » 

t output  data 

MG  IS  TER 

34 

1 

RQMADD</0>* 

t ADDRESS  RLC 

f OK 

CUN r ROE  RAM 

57 

1 

RAMADDI 7-0) 

t DUMMY  RUG  NEEDED  TO  SPECIFY  RAM  MEM 

58 

c 

59 

c 

40 

c 

• 

41 

c ******************************************************************************* 

42 

c 

43 

c 

44 

c 

45 

DECODER * 

44 

1 

CLK-T 

47 

C 

48 

c 

49 

c 

70 

c*«***t ************************************************************************* 

71 

c 

72 

c 

73 

c 

74 

TERMINAL . 

75 

1 

ALUF < 3-0 ) *RF 1 3-RF 1 1 

s 

DETERMINES  ALU  FUNCTION 

74 

C 

77 

c 

78 

c 

79 

C**«* ************************ *************************************************** 

80 

c 

81 

c 

82 

c 

83 

MEMORY  * 

84 

1 

RAM(RAMADD>=r 

RAM ( 0-377 » 7-0) » 

t 

SCRATCHPAD  RAM  254  X 8 

85 

1 

ROM (ROM A DID- 

ROM (0-377 » 43-0  > r 

t 

CONTROL  RAM  254  X 34 

84 

1 

ROM  1 (KOMADD) 

“KOMI  ( 0-377 » 3-0)  r 

t 

CONTROL  RAM  EX  254  X 3 

87 

1 

GPRME M < GF  K ADD  > «GPRMEM  < 0- 7 » 7-0 > 

9 t 

GENERAL  PUR  MEM  0X0 

88 

1 

WKRMEM  < WKRADD ) “WKRMEM < 0- 7 t 7-0 ) 

t 

WORKING  REG  MEM  8 X 8 

C 

c 

c 

c*** ************************************************************** ************** 

c 

C BRANCH  CONDITION  MULTIPLEXER 

C 

HULTirl  tXLR.ROMBKAI I OI-RFll 
0.  RF2<7>-RF2C4>1 

1>  DVREO(O)  -I'VRL'OC  1 > I 
2»  BWEG<1)-BVRE0<4>> 

3*  BVREG(2>-fcF2<4> 1 
4 1 BVREG(3)-RF2< 4) I 
S»  BVREGI4)  '-RF2I4)  1 
4>  BVREO < 5 > -RF2I4 ) I 
7>  BVRE0C4>-RF2<4> 

C 

C 

c 

c******************************************************************************* 


109 

C 

*— 

no 

c 

411 

GENERAL  PURPOSE 

in 

c 

112 

MUL  TI PLEXER  t GPRMUX ( 7-0  > -RF4 1 

113 

1 

Of 

WRYKCOI 

114 

1 

If 

ANSI 

IIS 

1 

2f 

ANG  • GIIFTR « 1 1 

• 

114 

1 

3f 

ANS • SMTTL • 1 

117 

C 

- 118 

-c~ 
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119 

C 

120 

c**«**tt 

itmn 

121 

c 

122 

c 

411 

WORKING  REGISTER  MULTIPLEXER  INPUT  SELECT 

123 

c 

124 

MUL  T I PLEXER » UKMUX (7-0) «R F6 1 

123 

1 

Or 

ANSI 

124 

1 

It 

WKYRCOI 

127 

1 

2r 

URYRin.SMFTR.il 

120 

1 

3 9 

WRVRTn . SMF  TL  • 1 

129 

C 

130 

c 

i 

131 

c 

132 

c***t*«* 

133 

c 

134 

c 

GENERAL  PURPOSE  REGISTER  RA/R8  INPUT  SELECT 

135 

c 

136 

MULTIPLEXER. MUX1 <2-0>-RF9» 

137 

1 

0. 

RF4I 

138 

1 

1. 

RF8 

139 

c 

140 

c 

141 

c 

142 

143 

c 

144 

c 

WORKING  REGISTER  WX/WT  INPUT  SELECT 

145 

c 

146 

MULTIPLEXER .MUX2( 2-0) *RF5I 

147 

1 

Or 

RF3I 

148 

1 

lr 

RF4 

149 

c 

150 

c 

151 

c 

13? 

******* 

133 

c 

154 

c 

ALU 

A MULTIPLEXER  INPUT 

153 

c 

156 

HUL  T I PLEXER » AI.UAMX  (7-0)  «RF1 21 

157 

1 

Or 

CPRMrM(RFO) 1 

158 

1 

it 

GPRMEM(RF4> I 

139 

1 

2 1 

CPRMEM(RFO) I 

160 

1 

3 r 

GPRMEM(RF4) 1 

161 

i 

4 r 

GPRMEM(RF8> 1 

162 

i 

3. 

WKRMEM(RF3) 1 

163 

i 

6r 

GPRMEM(RF8)I 

164 

i 

7. 

UKRMEM(RF3) 

163 

c 

166 

c 

147 

c 

168 

c*********************************************************«* ******************** 

169 

c 

170 

c 

ALU 

B MULTIPLEXER  INPUT 

171 

c 

172 

MULT IPLEXER » ALUPMX ( 7-0 ) *RF 121 

173 

1 

Or 

UKRMl MCRF3) 1 

174 

i 

1 r 

UKRMLM(RF3) 1 

173 

i 

2 r 

WKRMFM (RF3 ) . SMFTL . 1 1 

176 

i 

3 r 

0001 

177 

1 

4 r 

CPRMEM<RF4) 1 

178 

i 

3. 

GPRMEMCRF4) I 

179 

i 

6 r 

0001 

180 

i 

7. 

000 

181 

c 

102 

c 

183 

c 

104 

******* 

B6 
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•— 

103 

C 

186 

C 

ALU 

FUNCTION  SELECT 

107 

C 

180 

MULTIPLEXER * 5UMREG  (10-0)  =ALUF  1 

18? 

1 

0* 

ALUA. ANP. ALUBI 

— 

1?0 

1 

1* 

ALUA.OR. ALUBI 

191 

1 

2* 

ALUA ' .ANP. ALUBI 

192 

1 

3* 

ALUA.XOR. ALUBI 

1V3 

1 

6* 

ALUA. APP. ALUBI 

196 

1 

5* 

(ALUA.SUP. ALUB) .SUB. 1 1 

195 

1 

6* 

<ALUP. SUB. ALUA) .SUB. 1 1 

196 

1 

7* 

Al.UB ' 1 

197 

1 

10* 

ALUA. ANp. ALUBI 

198 

1 

11* 

ALUA.OR .ALUBI 

19? 

1 

12* 

ALUA' .AND. ALUBI 

200 

1 

13* 

ALUA  . XOR . Al.UP  1 

201 

1 

16* 

< ALUrt  . AI'D.rtLUB)  .COUNT  . 1 

202 

1 

15* 

ALUA. SUB. ALUBI 

203 

1 

16* 

ALUB . SUP . ALUA 1 

206 

1 

17. 

ALUB  * .COUNT. 

205 

C 

206 

c 

207 

c 

200 

C**************** ******* ******************************************************** 

20? 

c 

210 

c 

RAM 

ADDRESS  MODIFIER 

211 

c 

212 

MULTIPLEXER.MUX3  < 7-0)  - RF 161 

213 

1 

0* 

UKRMP  M ( RF  6 ) 1 

216 

1 

1 * 

UKRMFM(RF6> .ADD. 601 

215 

1 

2* 

WKRMCM(RF6) .ADD. 1 001 

216 

1 

3* 

WKRMEM<  RF  6)1 

217 

1 

6* 

WKRMFM  < RF6  > 1 

218 

1 

5* 

UKRML'M  ( RF  6)1 

21? 

1 

6 » 

WKRMLM  < RF6 ) 1 

220 

1 

7. 

WKRMEM(RF6) 

221 

c 

222 

C 

223 

c 

226 

c********************** ********************************************************* 

225 

c 

— 

226 

c 

2 27 

c 

228 

TERMINAL* 

22? 

i 

ADDRA1 (7-0)=MUX3r  S SELECTS 

PROPER  ADDRESS  FOR  STORE  FETCH 

230 

i 

ADDRS1  < 2-0)  *=MUX1  * 1 GENERAL 

PURPOSE  REGISTER  ADDRESS 

_ 

231 

1 

ADDRS2  < 2-0  > = MUX2  I WORKING 

REGISTER  ADDRESS 

232 

C 

233 

c 

236 

c 

235 

**** 

k**********#****************l 

236 

237 

c 

238 

c 

TAKES  OUTPUT  FROM  ROM  ANP  LOADS 

IT  INTO  REGISTERS 

23? 

c 

260 

/P*START/ 

RF 1 -ROM ( ROMADD ) < 2-0  > * 

241 

RF2»K0M(R0MADD) < 12-3) * 

262 

RF3'K(IM(MMrt(iri)  < 15-13)  » 

263 

RF6*R0Mf RUMADP) 120-16) * 

2«4 

RF5-R'0M(K*0MADD>  (21  ) • 

265 

RF6»P0M(R0f1APD)  <23-22)  * 

246 

RF7-K0M( POMADP) (26) * 

— ** 

267 

RFB"RnM<KOMAPf»>  <27-25  > * 

26U 

W9»MINf  MllliMtlD  < .10)  . 

26? 

R'F  1 0 RUM  (ROMADD)  <31  >* 

L. 
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RF12-R0M(RI)MAnD>(  37  J5>  # 
Rfl  J-R('M<M>MAl»U>  (AO), 

RFI  4-ROM(RI)MAW»>(4.(  -41  > * 
RF IS  KOMI  < MIMA  HD ) < I 0)  * 
RF 1 A~KUM I (KUMAPD) <2) $ 
6TART»0r T-0 


c 

c IS  THIS  AN  ALU  FUNCTION  -YES  DO  CLK(0)-CLK(3> 

C 

/-RF14.EQ.7-/ 

C 

C EXCECUTE  All  INPUT  MUX'S  FOR  REGISTER  ARRAYS 

C 

/CLK<0>/  GPREG”GPRMUX*  START=*0»  T-T  .COUNT • »UKREG*UKMUX 

C 

C SELECT  PROPER  INPUT  REGISTERS 

C 

/CLK (1)/  IF < RF 1 0 . EO. 1 ) THEN ( GPRMh  M < APPRS 1 ) “OPRFG > » 

IF (Rr7.rO. 1 ) THEN ( UKRMLM ( A DORS 2 ) -UKRLG ) * 

T-T. COUNT. rANS-0 
C 

C SELECT  PROPER  INPUTS  TO  ALU 

C 

/CLK ( 2 ) / ALUA-Al  UAMX » ALUB-AI  UDMX * START-0 » T-T . COUNT . » 

URYREG  UKRMEM(RF4> 

C 

C EXCECUTE  ALU  FUNCTION 

C 

/CLK< 3 )/  ANS=SUMREG» T-T. COUNT. »RF14-A 


IS  THIS  A MEMORY  OR  I/O  OPERATION-YES  DO  CLK<0> 


^-RF14.LT  »6m/ 


EXCECUTE  MEMORY  OR  I/O  OPERATION 

)/  IF(RF1 4 .LE  *2>THEN( I F ( RF 1 6 . EO . 0 ) THEN 

(RAM(ADI’RA1  )-ANS)El.r,r  ( ANS=RAM( ADPRA1  > ) ) » 
IF  < RF 1 4 . EO . 4 > T ME M ( ANS  * I MPEG ) 

El  SC  < IF  (RF  I 4 . TO.  5 > I Ilf  N<  OUTREG-ANS  ) ) » 

T*T  .ADD.  4»RF14«A»yRYRCG»UKRMEM(RF4)  • 
RAMAPP-MUX3 


THIS  SECTION  CHECKS  BRANCH  VARIABLES 
TO  BE  USED  IN  THE  NEXT  Cf U CYCLE 

IF(ANS( 71.10.1 )TMFN(RVRTn(0>-l )El  RF ( BVRFG (0 ) -0) » 
IF(ANS(A)  .1 fl.  I ) TMTN<  BVKI  G(  I >=»!  )EI  SF(I"SK»  G(  I >-0>  . 

If  (AMS(O)  .1  II.  I >1111  Ni  l«VKfTi<  ?>•!  >1  I SI  (l«VKI  lt(2>-0)» 

II  ( AN*.  ( I > .III.  I >1111  N(|'%|  n ( (>*-1  >11  Si  < I'UM  <i<  J>*0>. 

IF  (Rf  IS.  I II.  J>  !HIN(I*VM  li  < 4 > * ANS  < I O > ) • 

IF  ( ANS . AND . 777  • £0 . 000  > 1 1ILN ( l«VKEG(  5 ) =*1  >KLSE(  PVREG(3>-0>  r 
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317 

310 

31? 

320 

321 

322 

323 

324 
323 

326 

327 

328 
32? 

330 

331 

332 

333 

334 
333 

336 


IF<ANS(5> .E0.1 >THCN(PVRrG<6>«l >EL5E < BVREG < 6 > "0>  f 
T- T . COUNT . » RUNS r RUNS . COUNT • $ 

R0MAPD«R0MBRA-RF2<5-0>  I NEXT  CONTROL  RAM  ADD 
C 

c 

c 

C* ***********************************************************************  ******* 

c 

C CHECKS  FOR  THE  PROPER  NUMBER  OF  ITERATIONS 

C 

/CLK<5>/  IF < RUNS  • E0«  7)THEN<START«0)ELSE < START*  1 > f T**T • COUNT • 

C 

c 

c 

c************** ***************************************************************** 

c 

c 

c 

END 


tSIMULATE 


SIMULATION 


PROGRAM  I I N(il  II  FOR  tS  1 Mill  Af  I flN  IS  00114212 
COW.  Si/I  AIIUCA1ION  IS  00002345 

MACHINE  HARDWARE  ALLGCAIION  IN  CPC  UORDS  IS  00000534 


♦LOAD 


337  90UTPUT 

338 
33? 

340 

341 

342 

343 

344 
343 

346 

347 

348 
34? 

330 

331 

332 
353 
334 

333 

336 

337 
330 
33? 

360 

361 

362 

363 

364 
363 

366 

367 

368 
36? 

370 

371 

372 

373 

374 

375  • 


*LOAD 


START • RUNSf ANSf  T t 
RI*  1 »RF2*RF3»RF4» 

RF5»RF6»RF7»RF8» 

RF  9 r RF 1 0 » RF 1 1 f RF 1 2 » 

RF13'RF14rRF15fRF16' 

CPRMCMIO ) f GPRMEMC 1 ) #GPRMEM( 2) »GPRMEM<3>  r 
GPRMEMC 4) »GFKMEM<5> »GPRMEM(6> »GPRMEM(7> f 
WKRMEM(O) f UKRMEM < 1 ) » UKRMEM ( 2 ) r UKRMEM ( 3 ) t 
UKRMEM ( 4 ) * UKRMEM < 5)  r UKRMEM(6) » UKRMEM! 7>  » 
GPREG  f UNPEG  » ALUA  » ALUB  f 
WRYREGf  PVREGrOUTREG* INREOf 
ROMA DP » ROM ( ROMADD ) * RAMADD  t RAM  < RAMADD) 
START-1 t 
GPRAPP'OO  » 

UKRAUO  00  r 
GPRLG-OOO# 

WKRI  O 000* 

OPKMIM ( O /)»0r0i0i0i0i0»0*0» 

WKRMI  M(0  /)  OrO.O.O.O. 7f0»0f 

ALUA  Ot 

AI.UD  Of 

T“10f 

ANS^Of 

ROMAPP^OOOf 
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